 I'm going to be giving stuff away. So pay attention and you may win something, all right? What? Is it under the chair? Yeah, I'm going to give away this book, my phone, some t-shirts and stuff. So we still have some minutes to start. Any questions before we start? What are we going to learn today? Today we're going to learn how to do deep learning with Apache Spark. It's a secret. Don't tell anyone, please. Venezuela. Yeah, far away from here. No, I came from Mexico. It's still far away. What, sorry? It's about the package that allows us to create deep learning with Apache Spark and something I created too. By the way, all of this will be theory and code and all of the code will be, it's already on GitHub. I'm going to give you the link very soon so you can just follow along or just watch it later. These lights are not there yet because for some reason they're not OK and I'm going to change them and put them on the repo. So we have two minutes. I have stickers too and more stuff. So in the end, please come to me and the first ones here are going to win some stickers and stuff. Don't run. I even have some, like, camera stuff too, two pockets. I don't want to have so much stuff. All this, yeah, all right now. This is the best session ever. I have one that says, who's your data? So that's kind of the thing I have. Or stuff like data to the people. I have some Hadoop ones. Yeah, I have some for women too, like data women. That's it. I don't have any more genders out here, so. All right, we have one minute. I actually overslept. I woke up at 9. I'm sorry. I wanted to go to everything and your talk too. I'm so sorry I was not there. I wasn't sleeping. I'm super jet-lagged, so I'm sorry. Still. All right, I'm going to start right now. So hello, everyone. I'm so happy to be here with you. And first question for everyone here, who knows Spark? Who knows Apache Spark? OK, who's been coding with Apache Spark? Who's been doing deep learning with Apache Spark? Awesome, no one. All right. So everything here will be new for you. All right. So what you learned today. That was the first question that I was asked. So first, why would you be doing deep learning with Apache Spark? What is Apache Spark? If you don't know it, I'm going to tell you here what Spark. What you need to know about the learning for this talk? Almost nothing. ML Lib, that is a library for doing machine learning with Spark. Deep learning with Apache Spark. And next steps. So this is the link to the GitHub repository that every code is in there. I have some information to. So if you want to follow along, just take a photo or copy it right now. I'm going to give you three seconds. All right, so what would you be doing deep learning on Apache Spark? Anyone have anything to say about this? I'm sorry, I didn't have it on me. Oh, I'm sorry. That's a good answer. Someone else? Parallelize? Scale. OK, good. So those are great answers. And that's actually one of the things why we're doing deep learning with Spark is because Spark is an awesome framework to distribute your computations in a cluster. It's so easy to do it. And you have an API for Python, for Scala, for Java, and R. And it's becoming an standard in, I think, in every country right now and in every company. And it will be amazing to add the great stuff of deep learning to Spark, right? So there are parts of deep learning that are very hard, like calculating your parameters for your networks. Sometimes that can be very computationally expensive. And it will be amazing to distribute that process into a cluster, right? And I think Spark right now is the easiest way to distribute your workload in a cluster. So I'm going to show you here a package created by Databricks, the creators of Apache Spark, that will allow you to do this, right? First, we need to know if you don't know what Spark, the basics of Spark. So what is Apache Spark? Before going there, I created this, it looks kind of weird, but I created this timeline for the history of Apache Spark. You can find it in one of my blog posts. And the beginning for me was the 998 paper from Patient Brain for Google. Like the search engine paper is an amazing paper. Check it out. Because that was the paper that in the end translated into MapReduce. And MapReduce is what we know right now like the beginning of big data for real. And there were some issues with MapReduce. First one, Yahoo created Hadoop, like the open source version for MapReduce. But the issue here was that it ran on disk, OK? So it was not that fast. I mean, it was an awesome framework, but not that fast. And in 2009, there was a guy called Matei Sahaira, an awesome guy. And he created his thesis on a new way of doing MapReduce in memory. And he called this paper RDDs, or Brazilian Distributed Data Sets. This paper, you can find it online, and his thesis too. And it was the beginning of Apache Spark. Spark began to grow inside of Berkeley and other schools. And then Apache was like, OK, I need to have this on my portfolio. And then it became a open source Apache Foundation project. In just two years later, or one year later, it became one of the most important packages for Apache. And right now is the big data framework with more contributions in the world. Right? After that, I think the next part that interested us for this talk is that in 2017, last year, the guys from Databricks created something called Deep Learning Pipelines. And Deep Learning Pipelines was the beginning of this history of Spark and Deep Learning. So what's Apache Spark in simple words is a fast, in general, engine for large-scale data processing. It's general because you can do with a lot of stuff, like SQL distribution, or machine learning, or wrap analysis. It has an API for Python, Scala, R, and Java. And it's awesome. All right? So Apache Spark, one of the other good things about it is that it connects very easily with your stuff that you have right now on your company, like SQL Server, or MySQL, or Postgre, or Cassandra, or HBase. And you can distribute your computations in a Mesos cluster, or Jarn cluster, Caldera. So this was because they didn't want so much change. If they create something new, and you need something completely new to distribute your computations, it will be very annoying for people to actually go there. So they made it very easy for people to adapt this technology for their companies. And normally, you will run it on a Caldera server with Jarn, and you will interact with it with Jupyter. You can also interact with it with your IDE or whatever, but I'm going to show you here how to do it with Jupyter. So Spark has a core that is greeting mostly in Scala here that has all the instructions to transform your data into a distributed fashion. And it has several components that are very important for this, like Spark SQL to distribute your SQL code in a cluster, Spark Streaming to work with Streaming Data, MLlib to work with matching learning and GraphX to work with graphs. We're going to focus today on MLlib. Spark has three main data abstractions. The first one was the RDD. And this RDD is the one like their collections, their distributed collections of JBM objects. So they will run on your cluster and they will distribute your computation. And you can store in RDDs arrays, strings, objects, whatever you want. And they basically work in a transformation action paradigm so you can change them all the time. And then when you run an action, they will actually change. They're unmutable, so you need to save another one if you really want to have the changes in a new RDD. After the RDDs, the data frames were created. And the data frames were like what we know about Pandas or R, but they're distributed too. Right now, since Spark 2.0, the data frames are a type of data sets. And data sets are like data frames, but with types. They only work with Scala and Java, so we're not going to use them here. And so you know that data frames are a type of data set of row. They're very cool and they're optimized. So they're like you have a project called Catalyst. And that will help you to run your code even faster in a cluster. It's like your tiny compiler for Spark. All right, so let's go to fundamentals of Apache Spark. How will you start a Apache Spark code? The first thing you need to do is to create a Spark context. Right now, it's actually called a Spark session, but this is something that will connect your Spark code in Python with the JVM. And in this, you will be getting access to the cluster, all this slave node and the master node. If you see here in blue, we have Python code. And in something like cream color, you have the Java stuff. And Py4j is the library that will connect Python with Java. The shell and optimus, something I'll show you later, will create for you this Spark context and Spark session. But if you are running in a Jupyter vanilla installation or something like that, you'll need to create a constructor for Spark. So as I told you before, they have different contexts for Spark, like the Spark context and the SQL context and the streaming context. And since 2.0, they changed that to a Spark session. So you will only now start a Spark session. And from there, you will start the other context. All right, so this is awesome. This is one question. The context here is only to make available the functionality of connecting your Python code with the cluster that is in Java. Because Spark in the end, the Python API is a wrapper for Scala and Java. So you will need a connection to it. And because the cluster will be running in a JVM, and you will need something to connect both. It's like a socket. You will still need a Spark context. Because that will, that, yeah. On Java, you will need it. You will need it because you need something that a Spark needs to know where is my master. Where's my master? Am I running, luckily, am I running on a cluster? So you'll need to define which master you have, and how to connect it with the cluster, actually. So in any language you are in Scala, Python, or Java, you need a context. They work different. I mean, if you are in Python, or Java, or Scala, they're not the same. But you need them for all languages. Oh, you only Python for Java. No, no, you only Python for Java. Python for Java is only for connecting Python and Java. All right, so this is deep learning in three slides. OK? So the deep in deep learning is not a reference of a deep understanding of stuff, right? Or any kind of a different approach of understanding things. It's just the idea of successive layers of representation, right? This is the deep part in deep learning. And the learned part in the context of machine learning and deep learning means an automatic search process for better ways to study and present your data. This is why, because I tell this right now, because a lot of people, things of deep learning is a way of mimicking the way that our brain works. And that's not true. We actually have no idea how to connect the brain stuff with the deep learning stuff. It's just the name is just there, and it's very similar. But the process of learning a way to study the data and present a better, in a better way so we can understand it, that's the learning part. And deep is only meaning that you will use in deep neural networks. So you have networks that have several layers. That's it. So deep learning is only the automatic search for representations using deep neural networks. So I say this, I mean, I'm not saying anything specific here just to have your context about what's deep learning. And we're going to use it like this. We're not going to create neurons here in our code and stuff. We're only going to be creating really high level things with Spark. All right, so ML Lib. Who's used ML Lib before? OK, great. ML Lib, it's an awesome library that will help you to run your machine learning code in your cluster. And at the same time, it contains a lot of great algorithms for regression, classification, for running clustering in these different stops. And it's the machine library part of Spark. Right now, there's no specific plan to merge the deep learning part with ML Lib, but it's in the discussions of the community. Maybe that will happen in the near future. Because right now, as I'm going to show you, the Spark part, the deep learning part, it's in a different package on GitHub, open source stuff, but it's not connected to the actual core of Spark. So you can run ML Lib whatever you want on Hadoop, Mesos, Kubernetes, standalone in your cluster and with different resources. So I'm going to show you now the demo. So I have half an hour, so it's awesome. And this demo will cover transfer learning with Spark, will cover how to apply deep learning at scale, will cover how to use Keras in TensorFlow with Spark, too, and how to deploy your model in SQL. All right? So as I told you before, you can go now to the code and you will see this. This is everything I'm showing you here with more information. This is actually in a blog post I did like two months ago. All right. So the first thing I want to show you, if you have no idea how ML Lib works, I'm going to show you some... I'm not going to run all of this. I'm going to run just some parts so you have an idea how ML Lib works. So if you want to start a Spark session, this is what you have to do. From pyspark.sql.session, import Spark session, right? And when you do this, then you need to create two instances. One for Spark. This is the one, the thing that will allow you to create data frames and stuff. And if you need to create something like an RDD, you will be using the SC variable here. Sorry? No. If you don't... I mean, there's a way of configuring the notebook. So when you start a Python kernel, it will have all of this for you already. And if you don't do that, you'll need to do this to be able to create a context. So here, if I type SC, I'm seeing this Spark context. And if I click on the UI, okay, on the UI here, this is something very interesting where you will be seeing some processes that will be running with Spark, or jobs, or tasks, or environment. So I'm going to start. This was taken from some internet resources. So this is not mine. Just to show you how to use Spark. So we're going to try to predict the probability of an infant's survival. I mean, equal born death or alive. It's kind of sad, but it's much in learning too. All right. So the first step we're going to do is to read the data. And I really recommend that if you're using Spark to create your schema by hand. I mean, this is very important. You can infer your schema with Apache Spark too, but I would recommend for you to type it. It's not that hard, you only do it once, and you will do that if you're in SQL too. So right now I'm defining which type of variables I want here, like this is the name of the variable, and this is the type of the variable. And these types here are the Spark types. And I important those for here, from here, all right? So after this, I'm going to be creating a struct type, a struct field, this is kind of weird. So we can have, so we can assign each one of these names within type, and defaults here means that it cannot be nullable, right? So, okay, so then I'm going to read the data, like read.csv is very simple, very similar to pandas or R, and very transform is the data set, okay? I'm going to say the header is true, and I want to have the schema. So this will be happening and it worked. So we have here, if the show function will allow you to see the data frame. It's terrible, but it's there. So you can see something about your data set, and you can select stuff with dot select. It's like SQL code. Something like for me, Spark, now I only use Spark. I don't use pandas anymore. I think it's so much easier if you come from SQL to use Spark because it's more natural for you to do these selections and filters and group buys and stuff. And the other good thing is that you can run this code on your computer or your cluster, and the same code will run, all right? You don't need to think about how would I be running this on my cluster after creating the experiments on your computer. So right now I'm not going to explain anything, everything here, I'm just going to tell you that I'm going to create a one-hole encoder for one of the variables. I hope you know what's that. And then something very important. It's part word in a way that it only understands stuff for predicting. If you put all the variables in one single column and you create a vector with that, that means that if you want to predict why with some variables, you need to transform these several columns into one column with a vector and you have every one of those informations here. If you have three columns and one here, you have one and two, three here. So you will create a new column with a vector, it's a one comma two comma three. So it's like you're assembling a vector here and you need that for the part. So this is a code to do that. And I'm going to call that features. So for running, I'm going to run a simple regression here but it's actually a classification because it's a regression. And I'm going to try to predict if an infant will be alive at report. So I'm going to import it. And then I'm going to be choosing here some random values for my parameters for the model. And I'm going to be running it. This is the label I want to predict. And then there comes the notion of a pipeline. Who've used a pipeline before in your life? For scikit-learn or whatever. So pipelines are very cool ways of expressing your machine learning workflows. And the people as part copied it from the scikit-learn community. So in here, what I'm telling the code is to encode the data, create the features and then run the laser regression. Okay, and I need to import the pipeline part here. So after I do this, I'm going to be exploiting my training in training and test my data sets so you have less overfitting. And then you fit and transform your data. This is very similar to scikit-learn or something like that. So you fit your data, yeah. Yeah, you put the steps. I want to encode my data. I mean, this can be as long as you want. All the processes for indexing and creating features could be here in the pipeline. And they're cool because when you want to run a new data frame to that pipeline, you only need to do something like pipeline.fit. So you will only be creating these ones, right? So I'm running the, yeah. Can we please speak louder? They're actually objects. If you check it out here, I'm importing the classes here and I'm instantiating the classes with an object. So they're objects that tell Python how to do it. And here in the pipeline, I have these three objects actually, right? They're variables. So you run the fit on your training set and you run your test on your test set. And we wait for several minutes. No, this will be very fast. And then that's it. So you have your model. You even have something called to pandas because it's awful to see the show thing. So if you don't have that much big data sets, you can do this with pandas. And at the end, what you will see is that Spark created these features. I told you where you have one column with all the information and you have the raw prediction, the probability and the actual prediction, right? If we want to measure how well we did, we're gonna do a binary classification evaluator and I'm gonna measure the area on the rock. And not bad, right? It's 74%. So this was the idea. And with this idea, they created the Spark, the deep learning library. And we're gonna go there right now. So I'm gonna close this. Still have time, great. So I'm gonna show you here something very interesting. And I'm gonna be doing some classification of images with Spark, all right? And I'm gonna be classifying flowers, because why not? And one of the bad things right now of this project that is not pip installable or condense installable, you need to import it like a Spark package. Luckily I found a way of doing this very easily without changing your JSON for your Jupyter or changing your kernel or changing your environments. You only have to do this. If you need one package for Spark, you only have to import OS. And then you have to change this one here, the PySparks submit args. If you do this, I'm gonna be selecting here the from Databricks Spark deep learning 1.1.1 for Spark 2.3. And here it is. So that's, I did that. So when I start my Spark session, I have all the deep learning stuff already there for me. So when I do this, the first thing that it will do is that it will go to the internet. If you don't have internet, this won't work, all right? You will need to go to the internet and download the package and then put it inside of the context of Spark. So I also uploaded the flower photos for the repo so you can do this exact same thing, all right? So this simple call is to show you some examples of the flowers we have, they're very beautiful. We have tulips, we have daisies. And we're gonna try to run a code to predict what's a daisy and what's a tulip, all right? You don't need GPUs to run anything here. I'm gonna be running all of this in my computer and I have an i5 Intel, what, sorry? Use it, GPU. I mean, it depends on what you wanna do. I mean, if you're gonna use transfer learning and stuff, you don't need GPUs for nothing maybe. I mean, if you're gonna train a model from scratch, you're gonna need a lot of money and you're gonna need GPUs. But we're not doing this here. I'm gonna show you how to transfer the learning for other model that Google created into this to predict if something is a flower or a daisy or a tulip. I mean, I'm not training this from scratch, all right? So since part 2.3, we have a way of reading with Spark images, all right? This is very new for Spark and it's awesome because you don't read the actual image. You will read it and it will transform it to this vector for pixels and stuff that you normally will do with pre-information of your images. It'll do that for you, all right? And it's very easy. You only have to say, you have to import the image schema until I want to read all the images from this site, from this folder, all right? And it will go there and it will read all the different flowers for you and I'm gonna run it right now. It's very fast and this is just a sample just to show you how it would work. And yeah, it started Java right there because what it did there is when I did this, it called this part learning library. All right, so we have this in itself. It has some information about my image. So here I'm showing you something very similar to the thing I did before. It has some differences because I'm trying to do more custom reading for my data so it can run faster here right now and then I'm gonna be splitting on training and test for tulips and for daisies, all right? If you see three stuff here is because I'm really, I don't have time to run the whole code for all the flowers and I'm gonna use like 80 flowers to create a model, all right? Okay, so I'm gonna run this right now and let's see how many, I think less than 80, like 40 flowers. So here is telling me some warnings, it's okay. Just wait for a little bit. And if you want to see what Spark's doing, you can go to localhost 4040 and you have the UI here. If you click on show stream, you can see exactly what Spark's doing. You have the that here. You have the what's happening with everything. You can even click on each one of this and see everything in detail, all right? When you are more familiar with Spark and you will be doing this a lot to improve and optimize your code. All right, so it's not working yet. I'm gonna wait here for a minute a bit. Okay, it's running and running and running. You have BGG 16, you have BGG 19 too, I think. You have ResNet 50, you have Inception B3 and you have your basic models. But you can also use models from the Keras library. And I'm gonna show you that. How to, if you have a model for yourself that you trained before and you want to distribute it in Spark, you can do that too. You can change it, but you need to, you will use it, I mean, in here, I'm only using like the simple stuff, but down below I'm gonna show you how to upload your H5 file. I mean, this is so high level that you will have no idea of what you're doing right now because let's see this code right now. Let me give you an example. You have something called the Deep Image Featureizer, all right? And this is all you need for running Deep Learning with Spark right now. Yeah, Inception B3, yes. We're not changing anything of B3 right now. You can do it in here. I'm using the model as it comes, all right? And what I'm doing here is I'm saying I want the images and I want to output the features. This thing here works like you have now all the features to predict an image, but you need now an ML lib part, a decision tree, a logistic regression to be able to predict an A label. If you go here and see the code with more detail, I put some labels to tulips. I said tulips are gonna be one and these are gonna be zero. So it's like a basic problem of classification. So yeah, tulips and series and one, yeah. That's one of the things you can do. And I don't know if I have time to run this because I have like 15 minutes and it takes like two minutes per cell, but I run it before for you to see it. So what I'm doing here is I'm creating the Featureizer for the Deep Images, then a logistic regression with some parameters here. I mean, I'm using here the basic stuff because I know that this will output something called label to predict. And I did that here, you see? I called that label, but I could change it. Now what I'm gonna do is create a pipeline like before. And the pipeline what it will have is the Featureizer and then the regression, right? And then you will only do p.fit, exactly as before, you pass the training data set. And then you have it, then you have your model. And you can also evaluate your model and this is the thing we got, right? I'm showing you here, I created this UDF to show you how well this actually did because this was supposed to be a one, but it took it as a zero. You see here, it was a 0.8, so actually a one. This was okay, 0.4, zero is okay, 0.61 is okay. So this is like a comparison to the actual label and what we got. And so that's the first part you can do. You can even do something more simpler and it's called the Deep Image Predictor. In the Deep Image Predictor, you only pass a data frame. And that's it, you don't need a ML lip part, you only need this, you only need the image predictor. And this is the same, you read the images here, all right? And then you say, this is the input column, this is the predictor labels and the model name. And I want to decode the predictors too and have the top K10, all right? So when you do this, this is a transformer actually, you don't need to fit anything here. After you run it, you only need to run transform and that's it. But there's a difference between both. So if you see here below, this will give you a probability of for it to be a zero or one. In here, you'll do something very different. You know that in Inception V3, you have a specific types of images that were classified, like I don't know, like dog or house or flower or daisy or bee. And there's like 10,000 I think, I don't remember. And 10,000, okay. So, and I'm just choosing here the first 10. Because if you see here, in the Inception V3 code from ImageNet, you don't have tulips. So this is the images I'm trying to predict, all right? Like two daisies and one tulip, all right? If you go below here, the first one, the first one is the tulip. But it has no idea what's a tulip. So it's a picket fence. So not that good, but not that bad. And but for the other ones, it said daisy, okay, with 95% of accuracy in daisy again, okay? So this is because when you're doing transfer learning, you cannot know what the model didn't know before, okay? So you're only trying to classify things that it saw before. And here is a 40% chance to be a picket fence, yeah. This is the highest one, thank you. This is the highest one we have. If you see here, we have lower ones. So this is weird because it got better for picket fence than for daisy. Like this flower looked more like a picket fence than a daisy. All right? And this is like deep learning, all right? All right, so I'm gonna show you here something for if you come from Keras and the question was this. So you have some weights for your model. This is doing the same. I'm gonna import inception B3 and I'm gonna be exporting the weight of that. But if you have your own model with your own weights, you can save it as an H5 file. And then from Spark, you will be doing something with something called Keras image file transformer, all right? And with this, this is the actual code here. You will be passing your model file and you will be distributing in your cluster your Keras code, so your model, sorry. So this is quite awesome because without thinking that much, you will be only saving your H5 file and then distributing this in your cluster. And so this is only an image loader for defining the sizes for the images and to process the image too. So, okay, and I'm gonna show you something else and it's this here. Let's say you don't have that H5 model and you want to create a Keras model and after that you want to distribute your Keras model in your cluster. You can do that too with Spark. How? Here I have my Keras model, very simple. And what I'm doing here is I'm gonna have some sequential, with the sequential API, two dense layers and that's it. And then I'm gonna be saving this model, all right? This model I created. Then something called Keras transformer you will have what's the name of the input columns, the output you want to create and the model file, okay? And then you can run this on your cluster. This is a very easy way to distribute your Keras code in the cluster. It will be the data frames because one thing I didn't mention is that in Python, the RDDs are way much slower than in Scala and in Java. So if you're using ML, you have ML lib, right? But inside of ML lib, there's ML and it's like the data frame version of ML lib. And all you see here is working with ML because it's more fast and it's actually better for optimizing. I cannot hear you, sorry. Yeah? Can you hear me? Can you hear me? Yeah, I can hear you now. Yeah, all these are running as a single job, right? How do we say these are the number of processes or jobs that need to be? So if I actually run this, you will see a lot of things here and you're gonna see the jobs and these jobs in this part are divided in tasks. And when you see the tasks, you can actually see what's happening in the cluster and under the hood. Yeah, can we control that? The number of tasks or jobs. Can you what? Control the number of tasks or jobs. Control, configure. Oh, control, it's not that easy. I mean, Spark has a way of working and you cannot say, I want to run this in two tasks. It's not that way. What you can actually do, and if you see here in the beginning, I create some repetition here. So you can divide the more, even more tasks because that's even faster for some type of data. It's not that easy to control how many tasks a job will have. Thank you. All right. So, okay, we're here now. And now the last part. The last part, thank you. So the last part is, what if you want a person has no idea of deep learning to use your code? In Spark, with the deep learning part, you can create a SQL function for your deep learning code. And then someone else can do select and pass the function and the data and you will have deep learning in SQL. So this is how you do it. I'm gonna actually run this code. So in here, what I'm doing is that, it's not gonna work, right? Maybe you won't because I have to run some stuff. Oh, I have to run this, I think. Okay, so if it doesn't run, it's okay. I'm creating a UDF. In Spark, a UDF is called a user defined function. And what I'm doing here is, okay, it's running, I think. And what I'm doing here is after doing that, I'm gonna register this code for Keras distributed as a function for SQL. And then afterwards, when someone else is using my code, he doesn't know about how to use this or deep learning or anything. Well, he knows Spark and SQL, okay? If he knows how to do it, he only has to do this. Read the data, register the data as a temp table, all right? Yeah, so it's working there. And then this is what he has to do. Select inception, retrieve, this is the name of my function I created, all right? And image here is the name of the column, all right? And this is, yes, in alias as prediction from sample images. So this worked, okay? So I can do this, I can read it here. I'm registering a temp table. But something else happening here, all right? So I'm gonna finish this with one more example. I'm running out of time. But this is not that pretty right now because this will show the 10,000 type of images from ImageNet and the probability for each one to be. Like this is very unlikely to be an image like this one. I have no idea which is this one. I have to map each one of these indexes with the images from ImageNet. So right now it's not that pretty. But you can do it. Okay, so I'm gonna be closing this. I'm gonna go to the last example. So the last example is with Optimus. Optimus is a library that I created with one friend from Venezuela and with a team in Iron, our company there, that will help you to run Spark very easily. And it has a lot of different functions for you that will allow you to do machine learning, deep learning, future transformation, data cleansing very, very easily. If you go to your repo, you have here some information about Optimus, how to install it, how to load data, you can load data with it, you can clean data, and you have dot rows and clean dot calls, lower data transformation. You have a way to remove characters, to drop columns, rename stuff, apply. And you can run machine learning very easily, data profiling stuff. I'm gonna be showing you here only the deep learning part of Optimus, all right? So the first thing you have to import Optimus from Optimus. And when you start it, you need to say dl equals true. So you will want to download all the deep learning parts of Optimus. And this is the same example as before. I'm gonna be reading the data right now. Okay? The same as before, I'm gonna say lit here one and lit zero tier. So, oh man, go back. I think this worked. So in Optimus, this is how you have to work with stuff. You will never be using fit and transform again. You will be only using very intuitive stuff. So from Optimus, deep learning, image classifier LR is the first thing I show you there. Like you have your image fit your predictor and then you have your regression and then you have your data frame here. This is all you have to do with Optimus and this will create for you two things, your model and your data frame with the predictions. Okay? I don't think I have time to run it, but this is how you will do it. And you can evaluate it as simple. OP.DL.EvaluateImageClassifier and you pass the test and the model and that's it. And the part when you only have to pass the folder with the images is OP.DL.ImagePredictor. And this will give you everything for you and in a better way, it will only give you the best one. If you remember, we have the top 10 weird stuff there with the predictions. You can only choose one here and it's the best one. This is the one you want to see. Okay? Right now I'm in the process of transforming everything you have on Spark to Optimus. Right now you have random forest like this with one line of code, OP.MLDL.RandomForest. And you have stuff like string indexers and all with one line of code. It's much simpler than using the actual Spark API. And you have more information for running this in a data cleansing idea. So the goal for Optimus is to bring the best out of pandas and the player to Spark. All right? So this is it for me. I'm gonna be telling you some next steps. So the first thing is that if you want to see this, this code again and run it for yourself, this is the actual code. I'm gonna be running something called Data Science Live with Randy Lau and Kristen Kerr. Our second session is September 6th. And we're gonna be answering your questions for how to do data science and what's data science or whatever you want, all right? We have a webinar last week on how to get a job in data science. And I'd like to suppose some screening data is coming from Kafka, then we implement your project, and particularly, say, logistic regulations, and then we are taking some real-time decision on our model. You can do it with Spark. With Optimus, you cannot do right now anything with streaming, but with Spark, it's very easy to do stream analytics. So we cannot import from Kafka. Yeah, we can do. We can do that. Yeah. So in one last part, you can, I'm running a course with Matt Dancio for, it's called R for Data Science with R, sorry. Data Science for Business with R. And this is a coupon code that will only work until September 5th. If you want to have this code, it's a 20% discount for that course. I'm creating right now the Python version of this course. That will be, I think for next month, the first chapter will be released. And I'm only showing how to do deep learning with Spark and how to do with Spark machine learning, Python, I'm gonna use Lime and all this different stuff, all right? Lastly, I want you, if you like, to subscribe to our channel called Data Science Office Hours. We're running there every time of the time discussions about data science, what to do, what not to do, and that's it, thank you very much. Yeah, so I have the book and I have one question for you guys and who answers first is gonna get the book, all right? All right, so this is a question. Which types of flowers I was predicting in Spark right now? So that's a bad question maybe. So, I have no idea who answered first. I have yet luck, of course. So, maybe something more complicated, so it is. So, what's the thing we need to do with Spark to be able to do machine learning? The first thing we have to do. No, for machine learning, have a laptop. What, sorry? But you need to do something particular to be able to run it. To be able to actually run your code for some variables. You need to do something. Assemble it, assemble it. All right, that's it. Assemble it, yeah? So, if you want more stuff, just come here right now and I'll give it to you.