 Hello everybody. I know that is a bad time because the most people are now in in the land area but we want to share with you a pet project that Moises and me created the last month. First of all we want to say thank you to the organization to choose us to be here. This is very important for us because it's a, like I said before, a pet project and we want to share with all people that want to be interested in this kind of things and join us to create a very powerful tool. First of all we want to present us Moises. Hello everybody. I'm Moses Martinez and say there. Currently I'm big data and architecting a single international company. I have been reshared in different universities related with machine learning and other kind like this. I'm an organization of TechFest. If you know what is TechFest, I invite you to join us in March and if you want to share your knowledge or anything in which you are working, we have a code for papers opening and everybody is invited to send our proposals. This is not me. This is Moises again. Okay. This is me. I am Ines Huertas. I work with big data and machine learning. I am an entrepreneur. Actually I am working like an advisor for two companies. One made TIF and the other is the wayside project where we are trying to build the most advanced we will share in the market. If you are interested, we will talk before, after. I am our latest magic coordinator. I am in the open data NASA program, a program from NASA to explore and explore the open data that they have. They have a storage. I am founder in WBU Startup C. It's an association for women in business that we try to promote and to get some networking. And I am training in Platzi and linking the platform. Here you have our contact. But we are here to talk about other kind of things. Who works with machine learning every day? Wow. Cool. We are not so much but the people work with machine learning. Who knows what is key flow? Wow. You are in the edge. Key flow is a tool that we want to share with you in the session how it works and how we create these pipelines with big migration. But this, Moises, we will talk more advanced. And now who knows about birds? Really? It's amazing. Cool. I think this is not the best forum to talk about birds. But why not? No? Because six months ago, more or less, I think so, Moises and me, we are looking for a new project, the project to work together because we meet each other two or three years ago and we are talking about the possibility to work together. And we say, okay, maybe we can search a project, a social project and try to build something together. And I think so was before the summer when he called me and said, I have the data. And I said, I am ready. But I don't know what is the data. I don't care. And he started to explain me about he finds a data set about bird migration that could be interesting to explore. And I say, okay, I don't know nothing about birds. Really, I didn't know nothing about birds. And he explained me, okay, maybe we can find some factors that can correlate with the birds migration. And I say, okay, maybe first we have to study a little bit more about birds. And this is a little bit introduction. Bird migration is a regular season trips, often the north in summer and goes to the south in the winter looking for warm areas. The migration that the birds migrate is about to get better representation, get a better representation and get better quality of life. Not because they need warm temperatures and they have to move when the season comes. Okay, if the birds migration are season and start with a change in the temperature or in the climatology, in the climatology areas. And this change is based in variation in temperatures. I represent another factor. So what happened with the temperature in the last century? I mean, the climate change is something that has happened. If we take a look at the global average temperature in the last century, we can say that there are an slope, that there are high. And we can think, okay, if the temperature is changing, maybe the roads or the dates or the duration of the birds migration are changing too. Also, we start to think other factors that can apply to this problem. For example, we start to think about the carbon dioxide. The carbon dioxide is very correlated to the temperature. I mean, the concentration of carbon dioxide is growing parallel to the temperature. And when we were there, we start to think, okay, if the birds migration depend on the change of the season and the season are affected by the different factors like the temperature or the carbon dioxide, maybe we can find something that can correlate. And, on the other hand, we find this kind of news or studies that there is not only that we say, okay, maybe it can be. We find that the BBC, the Guardian, the European Union has studies that say something like that. Then, Moises and me say, okay, we have to try to create a CORO project. The CORO project tried to find patterns that affect bird migration and find external variables that can affect too. All these using machine learning. And what we use to deploy everything because it's okay, we can find the data, but we need an infrastructure where we deploy all these things. Also, we want to do it in an automated way to be easy to use. And for that, we use Google platform and a lot of tools that Moises told us that the platform has. But we start by the beginning. What is the machine learning? I know that if you work every day with machine learning, this is so easy for you. Okay, but it's a little resume. Well, we can have samples with different information and features, and these features could be labelled. When we say that the data is labelled, is the data has association with some status. For this kind of data, we can throw in algorithms when we input the labelled and build the model and then try to predict with raw input the new output. In this kind of algorithm, we call supervised set. There is other kind of data where we have only the sample. We don't have the labelled. For this kind of data, we used to use algorithms that can segment or classify by groups different features that the data has. At this kind of data, we call this kind of algorithm we call unsupervised set. And then we have other kind of data that is a group of action, states and rewards where we work with agents and with an environment. We are trying in design, usination, actions, and updating this state. With this algorithm, we try to predict what is the next action that the algorithm should use. This kind of algorithm is called reinforcement. In our project, we have different kinds of data that we will tell you. And mainly, we are using supervised set and unsupervised set algorithms. Trying to predict temperature change and the carbon dioxide from change emulation or identify typos bars using unsupervised data. And now my sessions told us how we can create all of these infrastructures to get more information about the behavior. Okay. Then after finding the data and define the different machine learning algorithms that we can use, the next question that we make of us was who we can create models using machine learning algorithms? The first problem that we have is because the data that we found was a lot. Many, many gigabytes of data with information about location, different factors related with the temperature, the climate, weather, and many things. And the problem was that we can manage all of these data with a normal computer. Then we have to define an infrastructure or something to do this. And usually when we have to face a machine learning problem, we have to define a process of training and a process of evaluation. This is the useful and normal process in which we have training data, a group of algorithms, and we have to create several models until we choose what is the model that obtain better result for the thing that we are doing. And after that, we have to deploy this model, okay? We need an environment in which we have to deploy this model in an automated way if it's possible to execute the inference that allow us to get our result depending on the thing that we are predicting or classifying. Then the question is, okay, we have all of these and how we can do that? How we can automatize this process? How we can do it this easier according with the big number of data that we have. One option is to use codeflow. Codeflow is an alternative to create pylons in which we have several processes that have to execute it in a sequential or a parallel way to create like a machine learning pylons. And, okay, perfect. We have the technology, we have the data, we have the different algorithms, and now we have to choose who we are going to construct or build this machine learning process, least machine learning system. If you know there is a really famous paper for the machine learning team from Google in which they say that usually the part of machine learning, that part in which we are going to create the models, is the smallest part of a real system. Then if you are working only with machine learning, someone have to provide all the rest of the parts. In our case, initially we are focusing on creating the model, but we want to create models that can be used for other people for the community or another kind of technologies. Then we have to face this problem. To try to solve this problem in a simple way, we use codeflow, which combines Kubernetes and machine learning, allow us to create really simple pylons for our machine learning algorithms. But, okay, how is working this? I suppose that everybody knows how to work Kubernetes. Everybody knows who knows what is Kubernetes and how it's working. I suppose that many people, because there are many talks about Kubernetes today here, okay. This is the normal structure of Clouds which is standing with. We have different kind of Clouds environment, Google, Amazon or Microsoft. The structure of the computer, the operative system, and we have that middleware that is Kubernetes that allow us to put a lot of different machines or different containers divided in pods that execute different kind of service. All of this element that we have in our real machine learning system, not only in the part of machine learning. Who is working QtFlow? Okay, QtFlow offers another middleware over Kubernetes that allow us to execute all of these things using the pods and the infrastructure that Kubernetes deploy. And what is inside Kuberflow? All of this and much more. There are many tools related with machine learning, starting with TensorFlow, of course. TensorFlow, Boa, Ergo, Seldom Core, Keras, everything that we need to create our models and deploy them in a simple way. Perfect. Okay. We have the data, we have the technology, we have everything. Then we can start our pet project to try to create a model to see if it is possible to predict what is the, it is possible to predict the climate change. Okay. But to do that, we have to create a pipeline in QtFlow and who is working this? Who we can create a pipeline in QtFlow. Okay. Usually, pylons in the machine learning environments are full of process, different kind of process, related with data ingestion, data storage, visualization, exploration of the data, data transformation, data verification. And we are not interested in any of this. We are more focusing on how to create models in an easy way to allow people to use it. Okay. Then we need a tool that allows us to automatize these or simplify the process of deploying all of these. And the way it's using QtFlow. QtFlow, it's really easy to deploy. Not really. If you want to install inside of the cloud, you have to do some things. But, unfortunately, Google offers a system that deploy QtFlow using a web form. Okay. Oh, no. There's no pointer. Okay. This form that we can see is the easier way to deploy QtFlow in a Google cloud platform. The only thing that we need, it's an account, of course. It's really easy to get one. And a project. We have to create a project inside of our Google cloud platform. To do this, we go to the website that is there in the bottom of the slide. And we fill the form. The two most important things are the authentication mode because you can use the credentials that Google offer you or you can choose for a user and password system. And we have to define which is the location or the zone in which you are going to deploy all the elements of your cluster. And the version, of course. Some months ago, you have different version of QtFlow, but there is only one. There is not a lot to choose there. But when we try to do this, we can face a problem. If you are using the free tire that there are many people that try to use for a small project or to try new technologies or something like that, the authentication or the credentials using Google Cloud Platform don't work really well. For this reason, you can use the login and password thing. Okay. If you can create your own cluster, you can deploy it. But first, you have to create all the things related with the authorization. You have to define the name, the super email and the different domains that are authorized to execute your cluster. After all of this and 30 minutes more, you have your own cluster of QtFlow deploying in your Google Cloud Platform, which is amazing, because if you don't know anything about infrastructure, architecture or something like that, this is really good for you. And you can focus on the important thing. Create a pipeline of machine learning. Okay. Then how we can do that? We can use this website, which is the website that offers the system that we deploy it. And we can act then to use the different resources that QtFlow offers, that are two. The first one, is this dashboard that introduce all the things related with QtFlow. It's really similar to the Google Cloud Platform but in a smaller way. And there are the two important tools here. The first one is the notebook server. Okay. Everything that you want to deploy in QtFlow can be developing using a notebook in your own console or in your own system. But if you don't know many about this, this is a good way to do all of this. And the other thing is, of course, our pipeline and run server. This is the system in which we have to deploy our pylines and our servers and run our experiments and automatize all the things that are related with the machine learning process. Then using these two elements, two tools, we can create our pylines and don't care about anything related with infrastructure or requirements related with the architecture. Okay. Perfect. We are going to create our pyline to try to create that model to predict the climate weather. Okay. The first thing is the information. This was the most important problem. Okay. We find a good number of information that was published for NASA and the government of the United States. But finally, the information wasn't really good. Then because there was a visit in pictures in which they located in a map the location of the different group of verse. And we have to find more information. Finally, after two months looking for information, we found this paper and we have more information about several radars located in different parts of the United States and Canada. And we started to work with that information. The information wasn't really good, but there was a lot. A lot of gigas. But the information was related with many things and not a lot with migration and things related. And the other thing was temperature. Of course, at the beginning of this project, we want to try to see if the temperature of the environment have anything related with the migration of the verse. And for this reason, we choose this data set from NASA in which we have the variation of the temperature in the different years. Okay. We get the data and we have to see how to join all of that and prepare our pipeline. This is the pipeline and a really simple one. No parallel things. Only sequential threads. We have our Jupyter notebook. And three process. One for data ingestion and data loiter in which we are going to introduce the information that we want. Then we was in Google Cloud Storage and in Google BigQuery. A training phase that had to create our models in Google Cloud Storage in BigQuery for solar analysis. And finally, we have a serving process. All of this allowed the problem because we have a full machine learning pipeline. And all the things related with the infrastructure tour was delegated on Qt flow. Perfect. Good. Then how we can create this? The first things and mostly of the time that we employed here was in the data. In the longitude, we didn't understand really well what's the data. We didn't have any information about the different measures the different things related with that. And in this table, you can see some of the attributes that we have. For example, we have the period in which the sampling all the information of the different radars located in the United States and Canada was working. Of course, we have the weather defining like a percentage. We have the air pressure. We have the amount of snow. We have many, many information related with the weather condition and not a lot of information related with the bears. Only the movements of the bear located by location and longitude and latitude and not a lot of information. But there is the only thing that we have. Then we are starting to work with them and we try to convey all of this information with the temperatures. After that, we are starting to create our patterns. The first thing that we have to create was our ingestion phase or data loader. The first thing that we defined was two processes. One for data spliting. We are collecting that a lot of information that was for 20 years of collection of different files in different data sets because there is a lot of information and we have tried to combine there with temperature and another data for nature. This process was like an operation in QtFlow. It was the first step of a pipeline. How? We did. I don't know if you know many about QtFlow but this is the way to create a Python component. The different elements of the pipeline can be components that you can reuse or not if you want but have to be defined in this way. Why? Because each component had to be executed like a pod like a container. Then we have to define some elements for configuration. Here, we have to define the basic elements of the component which is that. After that, we have to relate this with a piece of source code. This is a simple code that collects information from the Google Cloud Storage taking all of that information, the different files, create a global data set combining with the temperature and put again the information into the Google Cloud Storage. After that, when we have the information, we have to connect this with the next element of the pipeline and we have to define it. Then we have to pour this diagram to the component that we are going to use in the pipeline. Okay, and how we can do that? We have to write code again. In this case, the code is a bit more simple, I suppose because everybody knows more or less how to work with machine learning. The first thing is like before, we have there the definition of the component. In this case, it's the training component a configuration for the container that it's going to execute the different models. In this case, this training process. This is the location of the container because the container when it's created is in the Google Cloud Storage system because it's the better way to take it and execute in our cluster. And then we have to define the Cloud Storage location in which we are going to save the model, the base image that we are going to use to create our container and the location in which we are going to put that container and how it's going to use it. And we have to define which is the function that is going to be executed inside of the container which is this time function that we have here. This time function, the things that have to do is to connect to the Google Cloud Storage again is to create all the different data states. Okay, all of this is not into the code of the pipeline but are not over here because there is a lot of things there and the code is really long. And after that, we have to define the different machine learning model that we are going to use. This is the first pipeline that we created. Really simple. We only use a deep neural network. Okay, in this case was a deep neural network regressor because the thing that we want to predict, it was the location of the burst. Try to see if there is any variation in the movement that they did to try to analyze if that was happened for the change of the weather or temperature conditions. And after that, we have to do the normal things for a machine learning process, a train process, an evaluation process. And finally, we have to return the results of our system and the training model. Okay, we have trained our system. We have that piece. And then we have to deploy everything. This was the structure of the queue flow system. After do all of this, we have to create that small pipeline that you can see there. And this is the basic process to join all the different methods that we defined in the previous step. As we can see each operation, each element of the pipeline is defined like an operation. We have data operator operation. We have train operation. And finally we have serving operation. Okay. All of these are a part of the different pipeline. And this is the element that we have to execute in our queue flow environment to run the experiment. And finally, we have to compile it. Okay, we have to compile it and create a DSL file to execute all the elements that we defined in the pipeline. Okay. Perfect. We have our pipeline. We have to try to execute our system and see if we can train. After that, when we get this step, we saw that, okay, we change the focus of our project. At the beginning, we want to create a model to try the variation in the behavior of the different barriers of the environment. And finally, we have to create a big infrastructure to create different kind of models. And when we are starting to test with the data that we have available, all the things, we see, okay, what about the climate chain? Okay. We focus in creating the pylines. We focus in creating everything to create the environment that we have with other people. But we move our focus to the real objective. Maybe it was because the data. And we didn't find a way to predict the global biosphere behavior that was the object at the beginning. Why? Maybe it was the data because we don't know a lot about the data. Maybe it was because we didn't have many, many data. But the good point, or the good thing of this project, which is in an early stage, we are working yet in this, is because we discovered that, okay, we can predict the change in the temperature according with the movement of the verse. We can define or predict the change in the humidity according with the movements of the verse. And of course, we can define using really simple machine learning models which are extremely simple how to the verse can move in the environment. But finally, our objective that was predict the climate chain in our opinion was a bit complicated because that was something astra and we don't have information to define, okay, we have to predict this group of variables in Europe of attributes to see if there is possible to say that the climate is changing because of the human actuation or because of the change in the environment. And the conclusion or the final conclusion that we have is that we need more data. And we need to understand better the data that we have available currently. For that reason, we can start with different scientists and research related with the bare movements and behavior in order to see if there is possible to collect more data, more information, introduce all the kind of species or understand really well what is the meaning of the data that we are using. Which at the end in this small project that we started some months ago was the most important thing. Have a group of data and see what we can do for that. And with this is the end of the talk and thank you so much to be here during the lunchtime which is a bit difficult. And if you have any question, you can do here or you can send an email or contact us across using the different social media. And the last thing that I'm going to ask or say is if you have two minutes, we have an evaluation for and we will really appreciate if you feel the for and say and answer the different questions that we have there. And this is the last thing and thank you so much for your attention.