 Okay, so let me introduce myself a little bit. My name is Joe Ray. I currently am working as a Dell Science in Shopping. So our team are doing a lot of e-commerce related machine learning projects, such as the recovery system for detection and paid ads. So before joining Shopping, I got my PhD degree from NTU. At that time, our research topic is about the natural language processing and deep learning. Okay, so actually today's my talk will cover two parts. The first part is why the Python is so popular in the machine learning and data science area. And the Python practice that we use a lot. So the second part, I will pick up three basic tools that I use a lot. Okay, so let's start with this figure. Sorry. Okay, so let's start with this figure. So this figure is actually from the Indeed website. So the Indeed website is actually a job posting website. So the line represents the percentage of the matching job posting given the query. So the query consists of the various person programming language and machine learning and data science. So if the number is high, it means the demanding is high. So as we can see in this figure, for actually the data is quite outdated. But after 2016, the Python has been ranked number one in the machine learning and data science. Okay, so as can be seen in this figure, Python is the most popular programming language in the machine learning and data science areas. So why the Python is so popular in data science? Based on my understanding, I think it can be, it actually has three factors. So the first one is Python is actually quite simple and elegant and consistent. So compared to C and C++, Python is actually a very high level programming language. It's very easy for us to pick it up. And also it achieves a trade off, a good balancing between the complexity and also the performance. Because as we all know, the most used Python interpreter is C-Python, which is right in C. So the performance is actually guaranteed. And also because the Python is a kind of open source, so it provides a full collection of tools to do the machine learning modeling and also deploy the machine learning models. So in our team, actually we use a lot of Python tools. So here I just give a very brief summarization about the Python tools that we use a lot. I just summarized into four aspects. So the first aspect is the data processing. So we will use pandas, actually it's kind of like Excel in the Python. I think our college will, for operation team will introduce it later. And also we have the Mac Pro Lib, which can help us do the data visualization. And also we have kind of like multi-processing tools, which help us do the parallel computing to speed up our process. Okay, so for the machine learning part, we use Cyclo, which provides an implementation of a lot of machine learning models. And we also have a Cyclo image and OpenCV for image processing. So for if we want to do the natural energy processing, we will use Jensen, we will use ARTK, but if we want to do the deep learning models, we will rely on the TensorFlow and Keras. And if we want to deploy the machine learning models on distributed computing platform, we will use Spark machine learning library. And also after the model deployment, we can use the web server API to deploy our models. Here we can use Flask and Geonicon. And also we will use a lot of other open source tools. For example, XGBoost, like GBM, to realize some machine learning models. And if we want to realize a very efficient database, we will just use the Redis. And if we want to do the kind of like job scheduling, do the cone job things, we will use the Airflow to help us kind of automatically do the job scheduling. And we will also use a notebook, which is kind of like web-based interface to do kind of like, to do the rapid prototyping of the machine learning projects. So here today I will pick up three basic tools, but I, sorry, okay, sorry. Sorry, could you? Sorry, Michelle. Okay, so actually today I will pick up three tools. So the, I will not touch the machine learning path because of today's conference is for the Python. So I will talk about three basic tools that actually help us a lot when we do the machine learning model deployment. So the first thing is the, it's named the Virtual EME. So Virtual EME is actually a very strong, very strong Python package that help us to manage the Python project environment. So why is controlling the Python dependency is quite important? Because, so let me take the TensorFlow as an example. Actually I want to ask how many of you guys have heard TensorFlow or used TensorFlow before? Well, quite a lot, cool. Okay, so I think for these people you know this or use this, you must know that TensorFlow actually has a lot of versions. So TensorFlow is actually a scientific computing library developed by Google, and Google is actually very actively maintaining this package. I think every two months a new version will come out. So for example, if two months ago, we were trying to build a machine learning model based on TensorFlow, we may use TensorFlow R 1.5, but maybe after two months past, we want to deploy our model. So we need to transfer our project from our previous server to another server. So the server may be the live server, right? So we just copy our code and we want to install the TensorFlow, but we may just wrongly install the newest version. For example, the R 1.7, because between the different versions, the syntax may be different. So it may cause some problems if you want to just wrong this code, this project. So it means it is very, very important to keep our Python projects dependency very clean or in a self-contained sub directory. So it is not easy to transfer your project from one place to another space. So that's how does the virtual environment help us to do that? Okay, so here the virtual environment will help you to manage the Python environments. And based on the virtual environment, we can create a complete and self-contained Python runtime environment in the sub directory. So in this way, we can just install all the Python packages specific to this project. Okay, so later I'm going to do a quick demo about this virtual environment. I'll go to mic. Okay, thank you. Okay, so I'm trying to connect to the streaming computer. Okay, I think, okay, okay, cool. Okay, so currently we arrived at a terminal. So we, firstly we open our terminal and if you, for example, if we want just to create, firstly I go to this folder to make it clean. So there's nothing here. So firstly, if we want to create a virtual environment, we only need to say virtual EMV and then we follow by a folder name. Actually the folder name, sorry. I just test the wrong name. Okay, so we just follow by the folder name. So the folder name actually will be the environment name. So we can see here we have a new Python exist in this folder. And then they are trying to help us to install the pipe and also set up tools. It may take some time. I think it will be fast, yeah, let's wait. And also we can, so here is actually they're trying to build a virtual environment for us. It may take some time. Okay, cool, so it's done. Okay, so if you want to, so actually what are these comments help us to do, they create a folder. So you can see we have a folder here and they will create some Python packages and also the PIP for us, so under this folder. So if we want to, so if we want to activate this thing we need to say source the folder name which is also the environment name and then being activate. So you can see the environment name test will just pop out, pop out in front of the command line. So if we want to test which kind of Python environment we are in, so we only need to type which Python. So actually this one is points to the specific location. So for compare, if I just go to another, to my default setting, so if I type which Python, actually that default to different locations. Okay, so it means the left environment will be independent from the left, from the right environment. So these two environments will not disturb each other. If for example, if we want to install a specific on-py package version, so we only need to type PIP install right and at the same time I think we can test which non-py version in our default Python setting. So import non-py and I, import non-py, I think the computer is quite nervous. Okay, I just print a non-py version. Okay, so the version is 1.8, right? At the same time in the virtual environment the non-py is also already, so let's just test this non-py. This non-py should be the specific version that I just installed before, so it should be up 1.11.1. So if I print non-py version, so it's the exact version that we want. Okay, so and also a good thing is that the PIP applied us a very good function to help us to manage to record the Python package that we have already installed. We only need to use a PIP freeze, so it will just list all the packages that you have already installed. So if you want to do a record to a local file, you just, for example, you can just print this package information to a TXT, okay? So actually we just generate this TXT. So if you, okay, sure, sure, sure, sure. Actually, I don't know how to do that. Okay. Okay, I got it, okay, okay, so it's, okay, so it's, okay, cool. Okay, so by doing this way, so we can have this local file, right, requirements.txt. So next time, if we want to replicate the Python environments, we only need to copy this TXT file to another location, and if we want to install the same package, we only need to create a new environment and then use the PIP install and dash R and then point to this local test. And it will help us because we're already in the same environment, so let me deactivate, then we will jump to another, we will just exist from the existing environment. So by doing this way, we can easily manage our Python environment and do the project transfer, okay? How many environments can we create? There are limitations. There's no limitation. We can create as much as you want, yeah, okay, cool. Okay, so that's how to, Michelle, could you go back to the slides? Cool, okay, so, okay, so based on the previous demo, the virtual environment can apply these basic comments, so that we can use the virtual in LV followed by the environment name. And also if we want to point our environment to a specific Python location, so we just need to pass the Python location to this comment. And if we want to activate this environment, we only need to say source and environment name and being activated. And if we want to just exist this environment, we only need to say deactivate. And also this can, like two Python comments will help us easily transfer our project. Okay, yeah. Okay, so, I think the slides, is that the reason? I think there are other slides. Oh, I think the slides, could we jump to, okay, cool. Okay, I see some slides are kind of like jumping to the front page, okay, cool. Okay, so the second tool that I want to introduce is the Jupyter Notebook. So Jupyter Notebook is a kind of like browser-based interface for us to do a rapid prototyping of machine learning and data science projects. So from its official website, so they introduce themselves as following Jupyter is an open source project aiming at creating a better work experience for the data scientists. So the Jupyter Notebook actually has a following good points. The first one is provide interactive and a scripting spot. And the second point is just a browser-based. So it's very convenient to open it up and also it supports windows such as tests and the code and the graphs and also the even videos and the pictures. So based on, so because of these points, so it's very convenient for us to share our code and do the collaboration work. So I'm going to also do a quick demo about the Jupyter Notebook. Okay, so in the Jupyter Notebook, I actually I have created a environment before so I need to activate this environment and this environment have already installed the Jupyter Notebook and other packages I will give the demo. Okay, so if we want to open the Jupyter Notebook, we just need to say Jupyter Notebook. And the browsers will automatically open up the interface. Okay, because I need to change the screen, I need to change to the... I think I need some time. Yeah, I think here, here maybe, maybe fast. The browser has already, okay, here we have arrived at the Notebook interface. So here is actually what Notebooks looks like. So if we want to build a new Notebook, we only need to click new here and then we want to, our kernel is based on the Python 2. So then it will create a Notebook. So that's actually what does the Notebook looks like. So the first thing is that we can change the Notebook name. So we only need to click the untitled and put for example, hello, POSG. Okay, so actually though, this one is named the cell structure which is a very important concept for the Jupyter Notebook. So because in the cell structure, we can put code here, we can even put markdown test here if you are familiar with markdown syntax. And so based on the, because of the cell structure, we can easily segment our code into different parts. So here I will give an example. For example, you want to import the NumPy and we only need to click this wrong and the NumPy package will be imported here and if we want to generate some random numbers, so we just need to tap A is equal to MP dot. If we've got the common name, right, we only need to hit the tap. So it can help us to do a kind of like automatic suggestion. Let's zoom in this test, okay? So we only need to the random and if I still cannot remember the function name, so I still can hit tap and it will just give me the suggestion and not be funny at this function, but if we don't remember the function usage, we only need to hit the tap and shift and shift and it will kind of like give the documentation for the function. So here we want to generate 10 random numbers. So we only need to do this way and it will help us to generate, oh sorry, I think I need to pass aside here. So it will help us to generate numbers and we can just print this number. And also if we want, and also even we can kind of like print this, we can also plot this array. I think the mouse is not so smooth, okay? So if we want, we can also kind of like print A as here and also if we know the markdown syntax, we can also make the cell structure as a markdown and we can do, this is notebook. So it will be kind of the, it's actually the natural test can be intended to this notebook very smoothly. So based on this kind of structure, it will help us to do a very rapid, very fast prototyping and also we can share our code very easily because it provides a lot of explanation tests and also the images. Okay, so that's Jupyter notebook parts. Okay, so let's go back to the slides. Okay, so finally we go to the last point. The last point is actually the parallel computing. So what does parallel computing do? Parallel computing helps us to use the CPU with multiple cores so it can help us to speed our computing process up. So, but actually the Python is not suitable for parallel computing because of the global interpreter log. So it means at that time, at each time, so the Python only allows one statement to be executed. So, but actually the Python has a multiprocessing package. So because of the multiprocessing package, it can allow us to kind of like, spawn multiple sub-process to overcome the disadvantages of the global interpreter log. So here is some good points of the, and also some bad points of the multiprocessing package. The first thing, we'll take advantage of the multiple CPUs, and also it is a very robust way because failure of one single process will not affect others. But because the multiprocessing will submit the jobs into different distributed memories. So it definitely will consume a lot of memories. And also, multi-processing Python package is good for CPU-bounding tasks. So it means if your task needs a lot of mathematical computation, you can use the multiprocessing. But if your task is IO-bounding, which means if you need to process the fails from the hard disk, you may consider the multi-sliding package. Okay, so here, because of the time limits, I will only introduce one path belong to the multiprocessing, which is named the pool. So the path pool from the module multiprocessing can be used to split our tasks into different chunks. And each chunk will be executed in parallel. And also the pool is actually to present a pool of one or more process that can execute independently available process call. So I will use the Jupyter notebook that I introduced before to illustrate this idea. Okay, actually I have already prepared a notebook fail here, so I will need to open it up. So the, okay, I need to use... Okay, so that's the notebook we have arrived. Okay, so we have arrived at notebook. So the first is the process pools. So first at cell block, we define a computing function. So what does a computing function do? The computing function is kind of like a very trivial task. If we give a positive integer n, and we just all calculate the summation of the square of the numbers, a sample from the range from the one to the n. So we run this code block, and then we have, for example, we have, we just randomly generate 40 random numbers. And these numbers are sampled from the range from zero to one thousand. And then we can just check the first 10 numbers. So the first 10 numbers are shown here. So a very, very intuitive way to do the, to apply this function, we define before on this array is that we just apply it sequentially. So we just do a for loop on these numbers, on these numbers, and then we feed this number one by one to this computing operation. And then we can run this block. I think it may take some time because it's a sequential way. And we use a time package to give the calculation. So we can see if we want to process 40,000 numbers, it may take us for 7.5 seconds, okay? So here, I, here is the parallel processing. So we can actually, we can split all the tasks this array to a different chance. And the different chance can be executed, can be processed in the parallel way. So here is also a markdown, a markdown syntax. We can just insert the image in this notebook. And so to, to, if we want to use the process, a multiprocess pool, we only need to call this command. We just define, define the multiprocessing pool and also put the number of costs into this function. So here we, we set the number of costs is equal to two. So it means we will use two costs at the same time. So we, we first initialize the disk.