 Hello, thanks for coming and welcome to this presentation about the scalable online machine learning First I let me introduce myself. I am Javier de Matias. I work as data engineering in three logic My company three logic is a intensive research and development company with this specializing in big data and artificial intelligence We apply these technologies and algorithms in a lot of target industry as banks, telecoms, health care or retail In respect to big data solutions we can tackle all the types of solutions with big data between for example Complete data architectures to special use use cases With the special interest in all the stages of the data from ingestion to processing or analytics security and respect to artificial intelligence We are ready to Obtain the maximum value to the data with special interest in deep learning formal learning to To get use cases where apply the detection Detection problems classification something like this Okay How is to organize the my presentation my presentation is organized first I go to introduce what is scalable online machine learning and why we think is very interesting interesting now Secondly we I will introduce what is Solma library and how can use this type of problems And finally I I will show an industrial use case where we apply this this library and This type and at this type of solution Okay, what is scalable online machine learning scalable online machine learning is the Is the idea the main idea With the conference of two important trends now first that's a stream processing that is very interesting now on The integration with machine learning that is the the main solution to most of the analytic data analytic problems Respect to a stream processing We have this set of good friends to to to implement solutions we have data stream processing frameworks to to implement application we have Solutions to integrate different components like Kafka or I've driven Q We can implement jobs with fling store spark All people think that is a good trend now because Companies needs to process in real time or real or near real time That there are a problem when we had to integrate Machine learning in the data pipelines in data stream processing pipelines when we can find a solution to to that The classical solution of the main the main important solution is to to divide the Data workflows in two circuits one circuits to create the machine learning model to We a store historical data We can train and machine learning model test if the model is good or not Of she's suitable in what we decide that this model is suitable We deploy it in the other circuits the other circuit where this machine learning model is Is used for example to make predictions or classifications? This type of solution is this is we can is the the solution That we find when we search for example in Google How solve it for example if you I Google this this topic a Month ago, and I'm fine Some figures but the same solution for example. This is an Azure solutions which we have a data lake where we store events of sensors and With the deep historical data. I train the model and deploy in the in the string analytics or this other with map We have a stream processing a bus processing the bus processing store the historical data We with this historical data with train the model And finally when the model is suitable for using production I will deploy in in a streaming processing This is there we can Talk about there a classical solution, but this is not It's not totally good. We have we can find problems with this type of solutions First of all, I have to develop two data workflows two different data workflows We come on points, but different we have Google has I explained before we have a training that a workflow and a prediction The prediction or exploiting the Exploiting the the the machine learning model this make That we have to build More complex architectures and maybe this more complex architectures gives us more points of file And the most important thing we don't have the most at today model possible if for example, I decide to Train my machine learning model with the events produced or collected during the last month and There are changes in my in the during this month. I have a Maybe I have a good machine learning model, but it's not totally ready to detect some Problems that they are in my in the deterrence the data trends now we we need we can Make a short the the time to to train and deploy the model But I always have all version of the most update model possible Okay, this is the the drawbacks of the classical solution But how we can solve it or what is the idea of a scalable online machine learning? The idea is to train and predict on at the same time and in this in the same solution in the same data workflow On this way we have a unique that the workflow with less point of failure the data architectures that we propose is is simpler and The most important thing we have the most at today model possible in in each occasion. We have the best model to Totally adapt it to the changes in the data trends Is it is it seems to be a good idea, but What what happened is difficult to to implement Something like this new is this is very difficult because we have two important challenge First of all, I have a challenge respect to machine learning algorithms Not all the algorithms are suitable because there are some algorithms that needs to process all the samples of the data set at the same time and This algorithms is not totally suitable to to use in on this way and the other thing in I Could find algorithms which are suitable to be adapted to the to this processing way but we had to adapt it because This machine learning algorithms had to be ready to only process each data only once and Maybe we can find a lot of algorithms, but had to be ready to use in the on this way the other challenges a technical challenge this is an example of Flynn cluster or whole Flynn application is executed in a cluster the job manager receive a task and Parallelize the tax of the job between the task the different task managers three task managers in this case the problem is where The tax managers for example has to train the machine learning model But where is this machine learning model a store because all the task managers needs to Need to access to to this machine learning model and needs to put could Be able to be able to to update this this machine learning model this challenge is or this Component that we can we can find to solve this this problem is Parameter server a parameter server is a piece of or a component for the architecture We can where we can store the machine learning model and with a communication between the workers the spark works for examples is a if it is a spark cluster or Communication with the task managers is a fling solution This parameter server can solve the the problem once we know what is Scalable online machine learning and whole with the challenge that we might need to to Tackle I will so as a library cut that try to solve this this problem. This is all my library So my library is an open source library. It is a of online machine learning algorithms is open source is what's called during the development of European research projects and it contains two important things first a collection of my online machine learning algorithms and a set of abstractions to make easy to develop new algorithms with this premise Solma have algorithms which adapted to to this to this parody that How solves the problem of parameter server? solve the problem solve this problem with a project an open source project called a fling parameter server fling parameter server is an extraction for the model parallel machine learning model and it could be used it could be integrated in a in a fling job This solution is a solution that you we can integrate in the data in the fling streaming API How works a fling parameter server fling parameter servers have a Communication within this then a communication between the between the Respect to the workers the workers have a input data stream and output data stream and during the processing of the worker maybe the Needs to access to the machine learning model if there are two operations to Communicate between the parameter server and the work if we need to Recover the most update version of the model. We make up full operation to obtain the model if we Have we have a more so that basing for example in the training step you will Get most Updated version of the model we have a make a push operation to update the Parameter the machine learning model stored in the model parameter server The communication between the worker and the parameter server is asynchronous We can make a pull operation to obtain to request the model, but only We have to wait until the we receive the answer pool to to obtain them and really the model Is the working and the communication for example if there were if the worker receive a new level data To train the model the worker has to pull the the model to the parameter server Wait until receive the answer pool and finally Train with the new data the trend is the model and when this new version of the model is sustained Push to the parameter server to to be ready for the other other workers. This is the main Working This is the the interface between the workers and the parameter server, but How we implement the relation with parameter server we will have we had only to to implement two things What what we do when we resafe data We will implement the method on receive When I receive new data how hold my Work it works on what we do when I receive the The pattern the machine learning model from the parameter server is like a callback With these two implementation we can interact we can implement interaction between the the model the worker and the parameter server and And Sorma is playing In the other slide Sorma is prepared ready to To add new algorithms that have abstractions to make easy this this works How is the implementation of a new machine learning of online machine learning algorithms with Sorma? We only you only have to implement two things The method Delta is to implement what we do when we receive new level data we update how I update the lasso model for example in this in this case and predict what what we do how we do the prediction with the model We choose with these two only these two things is implemented a new algorithms in the in the library all the rest of the necessary Implementation is included in the abstraction and the implementation of Sorma it's very easy to introduce new algorithms because we only have to think how to Train the model and predict the model which only once a sample Okay, this is the Sorma library and We know what is the caliber on line machine learning the Sorma library Maybe it's the the time to talk about and use case where we can we were a have applied this this technology I Could talk about on industrial use case It's about still making Still makers make a lot of types of products Different final final products are very different. We we can use still to car manufacturers to Turbine or a cable in this case we talk about produce Steel coil what is a steel coil in a simple language steel coil is a very long and think piece of steel Roll up. It's something like this this role. It's a very big role of steel and And How is the the production of a of a steel coil because it's we have we have in the beginning a big block of steel and after a processing of temperature and pressure We obtain this the the final steel coil of things a thin layer of of a steel What's the key in the the production the key in the production of steel coil is the flatness Isn't very important that the final product have the enough flatness to be Shootable for for the purpose that is was make The the idea of the of this program is we have a data streaming processing program but we receive data of the sensors that the That is measured during the production of the steel coil and we have to predict if the if the final steel coil Will be flatness or as flat as possible as as is need to the final purpose The plans here is here is that we all we always note the real flatness in the steel coil one day in two or three minutes after the Steel coil is is produced and it's a problem because in the the maker needs to know before to maybe stop their production or to And downgrade the the quality of the of the steel All the problem is that the we receive fragmented data We only research The the data of a sensor at a time We don't not receive all the data in the in the same very of time We receive during the processing of the steel coil will receive a lot of events of of data The the events that we receive it have this format We we know what is the coil ID with the identifier of the coil is only to identify to to know what's the Number to number the the coil bar ID that we identified of the variable that we The server variable this variable could be variable of temperature pressure or or other important magnitude and during the processing the value what is what how was the value of this variable in Of the coil and what is the coordinate where the coordinate is the the position of this measurement in the complete piece of steel How we receive the the data in the problem the problem we receive the data in two Kafka topics one of the of them Have the information of these measurements of the sensors and other half the the measure of flatness at the end with a delay respect to the those these Real-time measurements Okay, this is the the true The input data how what who is the expected output data is very simple We expect to know an early prediction of the flatness in the final piece of steel because this is not possible to know And during the processing And the global or the final solution will be an Application in this case a job a fling job Who received two topics one for the sensor measurements and other for the flattening measurements? and produce Producing a other Kafka topic a flatness prediction this flandex prediction will be read for for a will be read for for a operator in the factory and He could decide if this steel coil would be good or not of the Maybe if the steel coil will be the at the the best quality or maybe two or three grades in less than this best this great of quality Okay, this is the the idea but how to hold this word the whole words the application the As I explained in the before the data is fragmented big why because the between the starting of production of the coil and the end of production we receive a lot of even measurements and in each for example in As we can see in this in this chronogram You will reside from sensor ones the variable one in the X coordinate five a value of 14 for 3 of this value or we receive the different sensor information in different time during the the processing and only After the coil of who what's and we receive the flandest measurements the real flandest measurement this real flandest measurement is the information that we need to know if I would predict or not well Where we predict and where we try in the model In each time that we receive an input event I make a prediction With this value of a variable in a position we can we need to know how how we will the flatness and When the white train the model I train the model when I have all the information at the end at the end I receive the real flandest values and with these values I can train with all the events of the Receive during the coil production we have to take into account that we need enough memory to to a store the the event input events before Receive the the final flandest because we need to level this input Measurements with the real flandest values When we have had this information level we can train the model and update it in the parameter server This is the the whole is the application works and what's the results of this Application the result is to this is a comparison. This is this figure is a comparison between the Prediction of our prediction which is the top figure and the Real flandest measurement in the in the bottom of the of the picture in a detail We can see that the we can detect We can not detect the absolute value of the the flatness, but we can detect the data trends in data trends changes in the in the in this in the in the measure of flandest is maybe It's not enough to to get an absolute value, but it's enough to to the operator to Decide is the The coil is good or not and what is the quality of the still quality of the final coil Okay, this is the the results of the or the solution What the conclusion about this this technique our conclusion is that a machine learning in in the testing application is totally integrating and there are a lot of solutions now to use it but An scalable or limited learning could be a good solution because It's a promise solution and it seems to be good to to to use the problem now is that we have Important pieces of in the in this architecture that is parameter server But the parameter server solution that we can find now has to evolve and to integrate better in the data person Frame was today we have solutions like fling parameter server or the solution to a spark But are not enough mature in the future. Maybe it's necessary to find Technologics to to to implement it with a best performance and better adapted to the to the to all the Possibilities that we can find this type of problem For example, free parameter server is an open source project is very interesting But it's supported only for two or three open source contributors and they They did it in the in the kit have repository you can use it, but it's not it's an experimental solution. For example, the This the industrial use case that I will I have explained is only a prototype because it's very difficult to deploy it in a production environment. Maybe when we can find mature solution for parameter server We can obtain a good solution for the for this for this type of scalable online machine learning algorithms I'm okay. Yes. That's all. Thank you very much for your attention and I wait your questions and I have heard that Parameter servers can be as low sometime because when you have a lot of parameters to deploy And there is a lot of traffic in the cluster. So how do you feel this? Performing issue. Yes, this is main the main problem in the parameter server because this communication between the workers and the but I'm sorry maybe can Have a problem when you have a lot of traffic between the between them This is the one of the reason because this is a experimental solution because we can you can find problems with them when you have a lot of Traffic, there are maybe you have included problems of that blocks or because it's an asynchronous communication and maybe the parameter server is waiting to receive a An answer for the worker and the workers waiting something for the server and you can find that block This is moving the reason with this Experimental solution. Okay. More questions. Thank you very much