 Okay, thank you everyone. So I think this is the first time I'm in pi data right now and thanks to you again and max to make it like yeah, they told they only told me like why don't you give so I thought okay, let's think about something. So just to start off it okay this is something that I'll be also presenting in the pike in part so this presentation is more of a like a teaser kind of a thing to say okay like what I'll be giving in the pike in along with more additional slides out there. So like it will be more informal just ask any question and just stop me and also max stole like most of the crowd here is quite technical. So there is a lot of piece of code and everything that is here on just stop me anywhere. Okay, I think just to start off with like what we see is like so for any kind of like a machine learning I think the first thing that comes in our mind is data. And so we think about a lot of data sources that lies in the world right now. So for example, like a simple use case or a simple workflow for like a preparing data for the machine learning something that our company uses a very similar kind of a model where we have a lot of data something like in a data like two different kind of databases which I show you as the database server one and the server two. And also like we have a lot of external APIs which we usually pay for or even like the free APIs, which is basically just to make sure the data is more entrenched. So for example, we have the API. So a usual machine learning workflow has a lot of like you need to consume a data. So what we see here is like you have different kinds of data ingestion from the API data ingestion from the DB one and the DB two. And what you need to do is again like the next step usually follows is more of like aggregating the data. So you have the data aggregation step where we aggregate all the data on the basis of some common field or many other fields maybe. And after that we go into the step of like where we clean the data and the data is okay now to go for a training of the machine learning. So right now we'll show an example of like how we use a scale on in the whole flow and how we like start with. So for this similar kind of a like a workflow. So a name Python code would be something like okay so it's like we show the what is the ingestion part of it. So for an ingestion what we can do is like if you see the code you'll see like okay we read the data from the API and we put it as a CAC somewhere. And also like we have something like a database one from where we did as you have seen in the DB database server one here. So we read it again put it there and again read it and again put it in a DB two. Now you have the aggregation so I'm showing us an example using the pandas library in Python. So you read the data back again and you now try to aggregate using the pandas merge. And you have for example a field called as the ID. And after that what you do again you process it back to something called as a data aggregated CSV. And the next step that we want to do is like the pre process of the data which is another function which basically now rates aggregated data. Handles something like the missing data does some data preprocessing and maybe many other things you can do normalizations and many other things and you then return it. So what you do right now is for example you want to like run everything altogether. So what happens is like you write a method called which is basically a wrapper function which basically is like execute the data preparation pipeline and you just run it. Okay so this works. Now the problem that we face here is like okay we are bound to write bound to have errors like there are many kind of issues. Your DB server going down something or the other year like API third party API suddenly did some kind of deprecation some kind of change and all things start failing. So how do you start off with it. So for example in the previous code now we imagine like okay this database to read part now this fails. Obviously you understand like okay you have already gone with the data from API you are again done with the read and this fails. Now you are literally pissed off with yourself and thinking okay I have to run again. So again what happens again the API reading the DB one read and everything continues. So how do you avoid that. So let's see can we so do we run the whole thing like everyone will be thinking oh my god like I have pulled in such a huge amount of data. So it could so happen like your DB one could be something like in the HDFS DB something like an age base or something. So you read a huge amount of data into a CSV. But now you imagine you have to read do this step again. So it becomes like really clumsy now. So do we rerun the whole thing. So the next question is can we like do something like where we can maintain states and go forward. So like how can we reach you. So what we need to do is like we need the system to exactly figure out OK where it has failed last. Why don't it like continue from the previous state and continue. So for example in our case so we are already done with the database one read. So we have the data and let's we need to just restart this database to read again because there might be some DB connection error due to it. And we want to continue that. So we need to persist testing and also another point I think is here to maintain is like for example you start reading. So you read and start writing into the data DB to. So what happens right now is like after that it fails. So you have the data DB to dot CSV file but that file is basically now a dirty file. So you need to think about like how to make this operation atomic. So basically you need to think of an atomic file operation here again. So like let's think of like OK how to achieve like ignore the duplicate task and like how to make it at me. So we start again writing code telling OK. Let's see if the path exists or not. So we start with OK reading the data then we try to do something if we get exception. So this is where we remove it. So basically it basically takes care of the atomic city now. So if it's a dirty file I don't I just remove it. And now if you see the code becomes messier and messier. Now imagine you want to take an argument parameter for this particular job to run. For example what is the dead thing and all this thing. So you have to do something like import the argument parser just read it and all these things. So your code now becomes more and more clumsy. So the simple thing we can continue with this piece of code and also we have to keep in mind. If any error happens how do we know where the error has happened. So we need to go and see OK. This is something there is an issue here in some part here or the other. And also we don't see like what is the progress here. So for example you don't have any way how do you visualize this. You don't have any way to visualize this whole pipeline. It is something a Python code that runs. If you're OK you can just see the logs. OK this is it. So I wrote write this code and I just don't know how to fix it. Like I start crying the next day when I see my code. Or I have my colleague here. What I can do to him is OK. I just go for a holiday and ask him. Yeah. So these are the two options I'm left. OK. Now let's think about this whole flow. You know some different angle all together. Those who are into the spark world would have heard of something called as DAG. The directed acyclic graph. So can we think this whole workflow as a graph. So let's imagine this task again. It's like now you have something some fancy another logo with the dot CSV. No. So you have the read API. So the output basically is nothing. Something like a CSC called as a data API CSC. Now again great DB one had the same DB one dot CSV. This has DB two. Now you do the input and like the now the data aggregation. This three becomes input of it. And then again outputs a data aggregated CSV. And again like it continues. So what we are seeing here this files basically becomes your states now. Like the way we did. For example, if this exists and exists the code becomes clumsy. But using this DAG thing the concept of DAG. So there was a tool which Spotify came out with which I'm able to which I'm telling you in some minutes. So this is something called a Super Mario Brothers. I don't know anyone remember from childhood. It used to be like a kind of Mario and it used to be between brother called Luigi. They both used to be plumbers and they somehow I think as far as I remember they won't something like plumber of the year award or something. So yeah. So do we need a plumber? Yeah. Obviously we need a plumber this kind of jobs this kind of each and every operations we need to like connect it well together. We need to stitch it each and everything so that it properly follows like a very cohesive approach of the whole data coming in to the data training data. So what so what Spotify did was they came up with a framework similar to this. So it was basically to plumb all the typical tasks that you do. For example, you run a batch process which has for example, thousand tasks. They are interdependent interdependent among each other and all. So what they did was they came up with a Python module called as Luigi, which is nothing but this plumber here. And so basically it's open source maintained. So some of the features that they have is like dependency resolution. So what if if they can maintain this dependencies and sees okay, whatever we are trying to achieve with the if OS dot path exists. So what if they are able to do it? So it's basically a boilerplate to help you achieve this. So they have a visualization which I think I'll show you in the demo. Also, I think I have a snapshot of it. And also the good part is like you have a for example, if it feels you have parameters and configurations, which you can say, okay, you have to retry two times. So for example, your DB goes down for two minutes. You can say, okay, let's put a retry of like three minutes or something. So if for example, the DB to task fills, it will again go after three minutes and retry it. And also it has a command line integration in the sense like basically you just run a normal how you run Python dot py file. It's very similar. You can run it using command line. So this is something Luigi. Yeah, you can read more about the Luigi in the GitHub page. So most of them is like shamelessly copied from the Luigi GitHub page. All of these things. So yeah. So how a Luigi task looks like. So if you see in the Luigi tasks, so what we have is like the first thing that you see is the param. So for example, a parameter could be so for example, it needs to take some parameter from other tasks or maybe it's something like when you run the command, you can give a parameter. For example, it could be a date or it could be say something like, okay, you want to give how many number of how many number of parallel jobs or multi process threads you want to use something in that term. Requires is something which basically creates a dependency. So what happens in a require I say, okay, I am dependent on some other tasks that some other task has to execute first after which I will after which I will be done. So you don't have to go and write. You don't have to go and write this part of it, like ingest data aggregate data. What you say is aggregate data is dependent on ingest data and the pre process data is like requires this aggregate data to have. So this gives you more of a like a boilerplate easy to maintain and you won't have the you don't have to go for go to holiday. You don't have to cry for your messier code. Your team is happy. And also this run part is where the main execution happens. So the run part is where basically your business logic lies. Your request says, okay, what are the things that I require and this output part is something like, okay, after the task is completed, I want to maintain a state. So for example, in our example, what we are using is we are using a basically reading data dumping to a CSV. Basically becomes our output here, the state. Also it could be for a like we'll show, we'll see later, like for example, if anybody who uses scikit-learn may know like, okay, we can save the basically the model. So basically you deserialize it into a pickle object. Okay. So let's start logifying our data preparation task that we had. So let's see what happens. So this is basically how the API dataization now becomes. So what I do is like, I call a data from API using some input API URL and other things. I get the data. So I maintain, okay, this is my output. I want to store it there. So what I do is like, I just print the output that I get as a string IO into that output file. Similar for database ingestion, what I do is like I again read and write it. But yeah. So the database and the database to ingestion. So this three are independent tasks used. You will see if they don't have the request part because if you see the graph here, this three are independent. Okay. Now come, let's come to the data ingestion part. Data ingestion basically now depends on this three, which you see I've written as a yield, which is basically generated now. So what it will do, it will try to finish these three tasks first. And after that, all those dependencies that we have created will lie in the TMP folder for now. Right. So after this, what we say, okay, let's read all this from the API data ingestion, output database ingestion, and also we have three data frames here. Since I'm using pandas here, we have three data frames. So what I do, I can say, okay, let's aggregate the data since it's a business logic. And after that, I say, okay, this is my output target. Just I convert it to a CSE and dump it again. So it's more of like now I have stitched three tasks together into the data aggregation part. So my jobs now successfully run till the data aggregation. Now is the data pre-processing part where you can do for machine learning, for example, you want to normalize, you want to handle missing values, you want to do many other things with your data. This is where the logic comes in. So what we do is we say, okay, the aggregated data needs to be done, has to be there. So this will trigger this fellow, and this in turn will trigger the other three tasks. Now after that, we come up to the main business logic area, which is basically you can imagine you have a class and it has a, for example, a static method called pre-process. You just do it and you just pass this in, like maybe whatever parameter you want to pass, you can just pass it. Maybe the data frame itself. It does, it gives you the output, and you again convert the data frame process to a CSV file. So right now we have the training.csv, which is written now. So basically now we see into like, okay, how this Luigi train a machine learning thing like works. For example, now you have the data clean. So in the pre-processing part, what you can do, you can do something like the, you can do a one-hot encoding of the categorical variables. You can handle them there, create a like a proper flat thing which you can just directly give it to the classifier or a regressor, and it will start giving you a mod. So after this step, what happens is your train part. Now the train part again requires the data pre-processing. So I have the request here as a data pre-processing part. So in the run part, I can just, okay, this is just a method that we have which internally calls for example, a random forest or whatever it may be. And you just say, okay, let's train a random forest and the data frame. You say, okay, since I'm dependent on data pre-processing, I say the data pre-processing path pass the data frame to it. It again gives you the model, what you do, you have the output maintained as a dot a pickle object. You just do a dump. So you have like end to end this train part of it. So I think this is how it starts. So once I start, I think I'll show a demo where you can see all the stars. So you can see like, okay, it will first say, okay, whether a train is complete or not. Like if data pre-processing is complete, it will check for all the tasks. So once it is done, so for example, I can pass workers. So there are a lot of config parameters and a lot of things you can play around with Luigi. So I say, okay, your number of workers are five. So parallelly five workers will start asking for work to Luigi. Luigi, give me work. So what happens? There would be a worker that would be telling, okay, now you start with train data ingestion. And the next thing that will start, okay, now we have four workers left. So another worker can take up a walk, work called store data ingestion. So this is an example that I'll show. So I have taken the example from the, those who are into Kaggle, there was a competition few days back, few months back. It's called the Rossman cells. So where you have to forecast basically the cells. So I took that data as an example and I have the code. I'll show the demo. So what happens is now, okay, now out of your five workers, two workers have already started. So REST 3 doesn't have any work to do because you can see here. So this is the visualizer of Luigi. This is how it looks like. So you can see here, okay, the store data you're storing. So you're reading the store data. So the store data ingestion, you have the training data. It's basically the train data that Kaggle usually gives you. So what I'm doing is they have a common field called as the store. So on the basis of that, I'm aggregating the data and you can see, okay, this three tasks are done. And you can see, okay, there is a pending task called as data pre-processing. So we'll see this in a like a live way like how it was saved. And you can see the train thing is like something like it is still pending because it is waiting for the data pre-processing to finish. So this is basically the log that you can see. But yeah, you need not see the log a very easy way to see this one. The visualizer, you can, usually this is available in the port 8082 of the machine. You can change the port as well if you want. And yeah, this is only available when you run Luigi as a central scheduler. That means there is a single Luigi process, a single process ID which is there in that machine. All the tasks that you run, for example, you run 1,000 machine learning pipelines, you can see everything here in the single page. And yeah, so this is an example of like, we'll see this like how we run basically the whole code of your Luigi. And you can do a Luigi D ampersand or even you can use this Luigi D dash background, which basically will run the Luigi D process as a, it's basically a demo process that will run in the background. And yeah, it has some limitations. They don't have a scheduler yet. I think for the next version, they are thinking of a scheduler. And the second thing which I felt was a bit like, which they are still again thinking of it is like distributed execution. What if you're like all the number of workers, you give five workers. So what if like, for example, you are in a distributed cluster and all this worker starts working separately in different cluster. So that is something the support is not there yet, but I'm pretty sure this is something where the future will be for Luigi. Then you can like distribute tasks to different machines and all. So you can have a like a single, for example, if you're using Spark or something, you can have a single Spark cluster and you can in the Spark cluster, you install Luigi in the master and you can just run. So all the five, 10 slaves and all, though they can again trigger somewhere in the masters of another Spark cluster. And this can continue. And I think, okay, hopefully things work. So yeah, let's see a small demo. It's nothing like very fancy. So I think before that you can just check out. I'll share definitely the slide. You can look out here because I have like written the code here. It's there. You can install it. There are some known issues which you'll find, which you need to more play around with. Yeah. And yeah, so I think let's go for the demo part. Okay. So what we do is like, okay, so this is the code basically we have. So first we'll run it with the local scheduler. So basically what I've done is like I've created a virtual environment inside the project itself and it has only the requirements.txt. Everything is maintained. You can just figure it out, play around and see like what other things you can do and try to do. So what it does is it will do something called as, okay, this has a Luege ML pipeline, which is basically the main module, Python module, which will run. And it has a, you can see all this code, the one that you saw in the slides itself. So this is the main task that we need to run. We need to just trigger. Okay. We need to trigger train. So in your, if you see in the shell script here, what I do is like I tell him, okay, like run, this is basically Python and tell him which module, like number of workers and which basically task I need to run. So I say, okay, train. Okay. Usually people misses this line, Luege.run, which is really important to run because otherwise it won't be able to run and you'll try to figure out because I had this nightmare once. Okay. And we'll see first how we see in a local scheduler. So, okay. So let's see. So if you see, I have the run.sh, which I just need to run. Oops, sorry. So you can see, okay, this starts working. It will say, okay, there are no more tasks. You can just follow what it says. So basically it's the same thing. It will ask for, okay, what are the tasks that are pending? So it will understand, okay, the scheduler. So basically this is the time where it is basically drawing the graph, telling, okay, this is the task, these are two tasks which is dependent on this, this is dependent on this. So it draws the graph. Now it knows where to exactly start from, where to end and where to, like how to process. Like very similar to the spark tag that we have for usually the RDD executions. So this runs, you can see, okay, I've printed the data frame. And like, yeah, this is basically our predictor variable which you're trying to predict. For example, sales we want to predict. You can see, okay, already the one hot encoding is done. All the categorical variables are taken care. You can see assortment ABC. Like it is in data preprocessing in the logs. It continues, continues. And yeah, you can see since I'm using a random forest, you can see, okay, there is one walker which is continuously building the trees now. So you can see, okay, building tree or this, this. It will continue. Okay. So one thing to notice is like when we are running in a local scheduler, so ideally you cannot see the tag visualizer. You cannot see the Luigi visualizer that you saw in this slide. For that to, for this to see, you have to run in the central schedule. So which is basically you have to say, okay, let's run Luigi as a background process, as a daemon process. So hopefully it's done. Okay. So it says, okay, the progress looks smiley because there was no failed task. Yeah. It shouldn't fail. It's a very simple thing. Yeah. Now what we want to see is like, okay, how to see the graphs and all. Okay. So what we do is, now since we are maintaining TMP as a dependency, you can see, okay, you have the pickle object, everything saved. So what we do is like, we just remove all this files because otherwise the job won't say it will say, okay, it is already done. Actually, I think I should have shown that like, however, again, I tried to run the Luigi job. What it will say, all the dependencies are maintained. All tasks looks okay. I think I'll just show this again. So what we do here is, let's start off with the Luigi D process. Okay. The Luigi is, okay. So this is the Luigi part. So if you see, we don't see the tasks that we have run just now because it was using the local scheduler. It was not using the central scheduler. So what we do now is like, we go to the shell script now and say, okay, let's remove the local scheduler. I don't need to run in local scheduler. So local scheduler is actually good for your development purposes. You want to do a debug, see what are the issues and all. But when you run in a production environment, it is always better to have this in a run in a central scheduler. So yeah, so again, let's go to, sorry, okay. So let's go here. So we run it in the central scheduler. Again, it starts similar and similar, but let's see here what happens. So you see here now all the tasks that you're running appears. You can easily see the graphs here. Like, okay, this is a state. This tool is done. This aggregation part is left. These are the things that are waiting for it. So you just do a reload. Again, you see a data preprocessing. You go back, you can see, okay, what are the pending tasks? You can see, okay, this is pending, which are the tasks done and all. So yeah, hopefully this is done. Okay, now it's still doing. What we'll try to do is like, the thing we said like, okay, how does Luigi now creates the tag, maintains, okay, there is no duplicate task, all the dependencies are maintained. So we'll try to rerun it again. So then Luigi ideally should say, okay, your tasks are done. Still running. And also like, okay, I can utilize this time for something called as the configurations. Luigi has a lot of configurations in my code base. You can see I have maintained something like, okay, the client CFG, which is basically, so you can maintain all the historical tasks that you run. So Luigi, what it does is you can say, okay, it uses a SQLite, SQLite basically, to store all the historical tasks you have run. So you can maintain, okay, TV connection, which one and all these things. You can say where to store the path. So for example, I just suddenly kill, for example, a task. So what it will do is, it will just possess that task store state as a pickle there. When it runs, it will read there and see, okay, up to what I have executed and it will start continuing there. And yeah, you have to, you can set the record task history. We will see again like where we can see in the UI, the history part. Also, there is something you can configure is the error mail. So I think we can go to the documentation you can see here, okay, they have the error mail. For example, if any error happens or something if you don't know, what you can do is like, you can set your mail automatically. If tasks fail, you will get a proper like code error along with the error, like where exactly it failed and all. And also like if anything fails here, what will happen is, you will see a kind of a red dot and if you click on the red dot, automatically the Python error will come up telling some error or something. Okay, so the tasks are done. So let's see what happens if I run again. So it says the execution looks okay. That means it has already checked, drawn the graph, checked all the dependencies, whether they're available or not. So once it is there, so it just says the task is done. But yeah, so you can ask me like, if this job is to run hourly, how do you do it? So the simple way to mitigate this is like, just take a parameter as for example, an hour epoch or like date time along with the hour. And whenever you create the dependency files, you maintain the dependency like this file underscore date, year, month, date and time. So that way you can maintain all different, you can maintain hourly jobs this way. So usually, yeah. And I think, okay, this is the config part and all. Okay, the demo part, yeah. Also like, yeah, in the configurations, you can see a lot of things. And the good part of Luigi is like, it has a lot of integrations with Hadoop. So Spotify actually started this project to basically maintain their Hadoop jobs because they have a lot of, for example, you can use this path job and many other jobs. So you can say, okay, I pull the data using a Python pipeline. I pull it, I dump the data. Now I trigger a spark job, which again takes data from there and all. So you can connect like a lot of tasks. It's not only like you have to rely on a Python machine learning library. You can do, for example, you can write a spark, you can use a spark emblem, you have your Scala code, you can do a spark submit from Luigi again. So you can see the Luigi.country packages for us, I remember. So I think this is something like, you can look into, like they have a lot of, like people are like contributing very, a lot of like modules. So even the SQL Alchemy libraries right now comes to here. So you have something called as Luigi copy to table, which is something like, okay, you can just see. So for example, you want to push the predictions back after, like you do the predictions and you want to push it back to the library. So you have something called as Luigi, Postgres and all these things. So these are like, okay, from the external world, they are like contributing like anything. So you can always have a look into, okay, what are the things? You can use a big query, they support now Rates, Redshift and many other things. So it just becomes, okay, you just put something like, you have to just put, okay, instead of Luigi.task, it will become, for example, MySQL.copy to table task. So what happens, you just pull the data and just say, you don't have to think about anything. So this becomes super easy for you to maintain and all. Apart from that, I think this part, yeah, I think you can just feel free to play around with this thing, play with the configuration. So I have the build script written here. So this automatically will take care of everything, creating the virtual environment and all. So you can just go, especially for the Unix who have a bash shell, just go just run the build.sh and you are good to go play around. You can do a lot of things, like you can switch models, like you have this feature builder, play around with the feature builder, play around with the regressor, which is basically doing the recreational part. The only concern that most of the community is facing right now who are using Luigi and SQL learn is with the grid search and the cross validation. So what happens is, if you run this train model with grid search right now, so if you switch it over, it will throw you an error right now. This is just because, okay, this will throw you when you use a central scheduler. So there is some issue with the process ID assignment here. So when this Luigi D process runs, what it does, it just like assigns a process ID to it. And when grid search does, it tries to again like create multiple threads and processes out of it. So that is somewhere like a issue with the grid search CV. I think there is a ticket already open and you can see issues in the stack overflow. So this is something like they are trying to solve. And yeah, so in a local scheduler, this won't be a problem, but in a local scheduler, you lose out the visualization of what is running and all. So if you're okay right now, you can do your like hyperparameter optimization for machine learning, I think, somewhere else. And after that, you can plug in. So this is something like I saw as a drawback right now of the Luigi part. Okay, so back to the presentation. I think like I'm almost done. It's very short-term. Yeah, this is some of the links. So I think like you can just start off with the links, take a like a small thing. The only thing that you need to keep in mind is like when you do a task, make sure don't make the task. For example, you can wrap 57 tasks together and say that is a task. Then that defeats the whole purpose of Luigi. Try to make sure it is like more into like a small, small tasks. And so that you can like make those tasks small. If anything fails, you know where exactly it fails. And yeah, I think apart from that, like in my company, we have been using Luigi quite a lot. And the experience is that we already have pipelines in the life. For example, at industry, we have a lot of fraud. So we have the fraud pipeline fully controlled by Luigi. So it runs hourly job and all these things. So yeah, yeah. I think that's all from my side. I think you can just, I'll just share the slides. Yeah, like any questions or anything from any slide just feel free. Any questions for our speaker? Any questions? Yeah, yeah. If you're running this code in the background, what's the prediction of sales for us? I didn't get your question, sorry. No, you've been running the code in the background, right? So what was the prediction problem that you were learning that you've been running? What still? Prediction of sales. Oh, okay. So yeah, we can go into the code part. Right? Yeah, yeah. I've been waiting so long. Oh, no, it already ran actually. Oh, that is because like I'm just running random frauds with like five jobs and like around 100, okay, only 100 teeth, like trees. So I think yeah, it took a little amount of time, but Luigi has nothing to do with it. I think it's more of a scale on me. Is that your question? No, it's like you were talking about a processing code, asking about the results. Oh, okay. So there is a disclaimer in the GitHub. You can see like it was never meant to be like, I'm competing in Kaggle to win the money or something. It's just to bring along the task and how to use it. Yeah, actually you can see my code here. I am just using only, okay, I think in the feature builder. You can see, okay, what are the fields I'm using? And I'm pretty sure in the Kaggle competition with four fields, you cannot win. So I'm not pushing in for the accuracy at all. I see that you have handled using the framework with this dependency, but how do you handle if something fails, again, something might fail. Okay, so there are parameters here. You can check out in the configuration, say, how many number of times you want to try. So for example, if your code is erroneous, it's erroneous, like no one in the world can solve it. But if there is something that's happening, for example, okay, let's think about the next step. You do something like you create the model and in many companies, what they do is that pickle object, they basically put it to a redis store, which is a no SQL store. So if you want to put it, now what happens is that redis store is not alive for that moment. Something went wrong. So how do you handle it? So that task fails in a normal world. But Luigi, what you can tell him, okay, try for 10 times. It will keep trying, okay, I'll try to push the pickle object. So in terms of failure, you either get a mail, okay, this task cannot be completed due to this issue. You get a very detailed mail from Luigi framework itself. You can configure many other things there. But yeah, if it fails due to your code error, I don't think any framework can solve it. Before you chose Luigi, you probably went around and looked at different solutions as well. So what were the pro advantages and disadvantages versus other data pipeline frameworks? Okay, so regarding this, what I can tell is, okay, some of the other things, AWS has something called as AWS data pipeline. If any of you have used it, what you can do is like you can create a cluster, process the data, and again, it's very similar to this thing. But that's a paid solution. Whether you want to pay for it, it depends on you and the configuration is in a JSON and it's not super easy to do. So I think similar thing that, as I haven't used, but like some of my friends who were telling me was Airflow, yeah, I'm not pretty sure how Airflow is really good or does the same thing. But yeah, it may be. And also, I think those who are in the closure world, closure basically, they have something called as ONIX, which is basically similar to a, like a JSON distribution of like how the data pipeline. You can say, okay, this is the Kafka contribution, Kafka thing and all this thing. So you can just create the data pipeline. I think there's a lot of solutions. So since it's paid data, I think I'm just talking about Luigi. One last question. Anybody? Right, let's give our speaker a round of applause. Before we take a break, it's our second speaker.