 Can you hear my voice? Okay, so I know other people joining. It's lunchtime. I can understand So first of all apologies if this is in English I speak Spanish as well, but in the last years I have to switch my for family and business purposes between Italian Russian and English. So that's what be tricky She cares about the tonteria It's been but not the cost of techniques Let's go back to the the topic So Just a couple of words about me. I'm currently based in Ireland and Dublin I'm joining last February MSD, which is one of the top five biotech pharmaceutical companies in the world If someone in the audience coming from North America probably knows the company as Merc and I'm part of MMD AT MMD AT stands for Merc Manufacturing Division Information Technology So what I'm trying to do is helping the company in the data and digital journey in particular Identifying use cases of deep learning and machine learning in the manufacturing space not restricted to this We have other use cases in the supply chain and the business process as well There is a pizza there in the slides because I like Cooking and also heating If you want to ask me some tips about how to prepare pizza You're welcome and few numbers about the company in Ireland is present in the country Since more than 50 years now with five sites in for different locations. We have to in Dublin I mean located in Dublin north, but really my role is more at the EMEA level so I'm dealing with project also in at moment in Switzerland and other countries and If you want to get more information that is you can go to the website and a couple of words about the Dublin tech hub which is one of the Most important tech cups in Europe at moment. I know that some politicians in particular and from the country I come from from Italy sometimes referred to Ireland such as a fiscal heaven Which isn't true because the government gives a lot of Facilitation as a company Whatever the size of the company if it's a big company a startup and meeting some company But everything is regulated so you have to create jobs You have to do research development. So a lot of stuff is coming and in particular in IT data science life sciences financial banking It's it's really a challenging and interesting place to work with The core topic today is about how to do distributed deep learning using some of the most popular Framework for Python, but then doing the training on a purchase park which runs on the JVM So it's a totally a different environment So you have to deal with the different things problems that I'm going to do and also to explain why sometimes you have to do this and A solution that worked for us in the last Past three years So I'm pretty sure that you already know what deep learning is so just in put a couple of slides in To explain in a lemon's terms what is is a specialization of machine learning where basically you use Deep neural networks. So an overall network typically has an input layer and uplayer and different combinations of Inner layer called hidden The combination defined the particular a class of problems non-linear problems you want to to solve the process of The life cycle of deep learning is similar to machine learning. You have to prepare your data your training and testing data start training your Algorithm and and validating as well when you're happy with the with the performances You move this to production start to do inference with new data and go back if you're not happy or you want to spot some Space for improvement. So as I mentioned, there are different flavors of Neural networks depending on the specific problem want to solve for example here. You can see You typically use LSTMs in all those cases which involve time like for example, if you want to do forecasting on time series You use a CNN Traditionally on whatever is related to object recognition or object classification and so on In terms of the programming languages to do deep learning, of course that the scientists prefer mostly Python For a reason for more reasons because it's easy to learn there is plenty of availability of libraries and framework for Everything including deep learning In the chart on the left here You can see these are the stats coming from coming from KD nuggets about the power scores of for a framework Python framers for deep learning across 2018 as you can see there is a tensor foreign keras are the top two and looking at the data of 2019 these are the data for the first six months. It's the same up to the end of September anyway There is a tensor flow still on the top and you can see that pie torch is gaining position is in the second place Because there is now much more engagement of Facebook on trying to build a greater community of pie torch if fight or she's ready to production is still to be Verified because tensor flow is in the market since a long time is more stable And there are plenty of use cases in the real world. The fact that keras is in the third place Doesn't mean that it's because is a Becoming less popular is the main reason to me is that keras is merging to tensor flow So now there is the availability of the of keras as a high-level API in tensor flow. So that's why You can see pie torch in the second place to me in for a 2019 Just a quick poll how many people know what tensor flow is or use tensor flow. Oh That's good. And how many people use a keras or keras as a high-level API for tensor flow Okay, so it's a good number. So tensor flow, you know is open source has been released by Google There is a huge community behind this framework. So it's very popular is much more an end-to-end framework It's not just for machine learning or the planning you are having from data preparation to then putting things to production monitoring and so on Probably one weakness point of tensor flow is that the low-level API are really hard to Assimilate so people starting with this framework for the first time could find really difficult to become expert So that's why they moved the idea of the authors of keras to provide something This is specific for deep learning, but with a very high-level API. So the entry point to the framework Should be his motor and people just start to focus on their specific models and specific problems And of course both allow the execution of code on top of CPUs or GPUs And a keras runs on top of different backends including tensor flow in my experience I had use cases of keras running on top of a tensor for back-end I never did something with different backends So I can do a comparison of performances doing this and this is the original architecture high-level architecture of tensor flow up to Release 1.13. So you can see there is a distributed execution engine Different levels of APIs at low level. There are the real tensor flow APIs Probably wondering why Python in this chart is in orange and other languages are in black Because Python so far is the only one programming languages Supported by the stability guarantee from Google if you try to do some things for in Java for example of go If you want to move things to production be careful because you could have some problems So I would not encourage people to use those API do things in in Python and then you have other levels of APIs and Starting from person for 1.14 and more with tensor for 2.0 as you can see there is keras is there is part of the framework itself So you can use keras as a tensor for a level API and then use whatever is behind So it means that it's an easier entry point for people starting with this framework Today, I'm going to talk on how to distribute to do distributed deep learning for these two frameworks We hope with the solution I'm going to Introduce there is no support for a pie torch yet. Don't know in the future if this would be introduced So I'm another quick poll how many people Use Apache spark in their projects. Okay, so how many people at least know what Apache spark is Okay, there is good number. So anyway put this light is a distributed Engine for unified analytics that run on top of It could be also a community a commodity Cluster and it's pretty fast because by default processes things in memory across all of the nodes of the cluster and It has support for different languages including Python Java Scala and there are different ways you can Deploy spark. So probably you could run things on top of something in the cloud On data bricks on Azure on EMR in Amazon or you could have your own prime Installation of a spark using Kubernetes or yarn or probably apache mesos that probably is the worst way to deploy a spark cluster and What are the use cases when you really need to train your models on top of something like Apache spark So in a distributed fashion, there are use cases typically when your network models is Is big so you can't Pretend to achieve result on training these models in a single machine having is that that machine has a lots of GPUs or when you have huge dataset just to Put some reference and work with some use cases in fraud Western abuse for a big healthcare provider in the United States and this company uses to process a trillion claims per year and when you have to deal try to do machine learning and deep learning on To spot potential sign of fraud Western abuse. You have to cream this data with the provider's data the Customers data some go edge government agencies data and other data From other third-party providers so you can imagine with this dataset You can do training on a single machine on a few GPUs. So you have to do something on top of spark Scarces of GPUs. This was a problem up to last year because there was some sort of bad mindset when you have to ask for a budget for a project to buy CPU GPUs if there was no data science name in your team I mean if it was fraud Western abuse doing data science there The business say why you need this money for a GPUs. You have different servers with the CPUs Why you need this now things are changing but anyway if you have availability of clusters on machine There's a chance it's better to deploy spark there or use some cloud distributions such as that data bricks There's a spoiler here about the next slide. So the solution I'm going to propose today I'm going to talk on distributed these models on Apache spark But the same works also if you want to do things in parallel on a single machine with multiple GPUs So you use the same code what is different is that you use a different training master Which is a specific class called called a parallel wrapper So in case you have enough GPUs you can still use this tech stack that I'm going to present Challenges on doing this there are technical challenges. You can see in this light So first of all is that the execution model of spark is different from The execution model of most part of the available Framework for deep learning in particular for those implemented in Python So it means that in spark, you know, you have stages It's for each stage that you have a given number of tasks that run in parallel If some of this task goes down it's up to spark to try to restore it and anyway This doesn't affect what the other tasks are doing This model is good, but it's bad for training a neural networks because if a task goes down Even if spark restore it you have to start the training from scratch And so this means that you have to spend more time more money and the living result With a significant delay for your customers GPU configuration and management could be sometimes a nightmare because The deep learning framework allows you to run the code on top of GPUs But then when you have you to manage the infrastructure You have very few utilities to do this and this could be really challenging and last but not least You have to pick up one between performance and accuracy because if you run things with a spark Just using the spark API spark components and running on top of the JVM only with such as traditional Java Scala applications If you want to achieve the best of performances, you will have to sacrifice something in terms of accuracy Because the JVM has to do some rounding some approximation that at the end will affect the accuracy of your model Vice versa if you prefer to achieve the best accuracy you have to sacrifice something in performance So it means that your training could last also days. I probably it's not the case You could do this not only technical challenges on doing this There are other challenges in my career. I've been in a position of managing teams of software engineers and data scientists These two groups have a different skills a different mindset while software engineers are things much more about Scalability performance putting things that work in production the data scientists often have some Mindset more oriented to the research. So they say, okay, did this model. This is a fancy model I checked that this accuracy is 98 percent. Yes but then we have to move this to production using the real data and Run this in parallel on a different environments and then you have to rewrite everything from scratch Some companies tried in the past to create some unicorns, but this doesn't work. So I believe that data scientists should be data scientists with a little flavor of Understanding something in the DevOps space and software engineers should be software engineers with a little understanding of the data science board and try to find something in your infrastructure that could be some sort of Common layer common language between the two teams and this is how we solve this problem We started this in 2017 now you have probably more options, but this is something that still works today Deep learning for Jay is an open source framework that we introduced in our tech stack It's an open source framework for doing deep learning on top of the JVM so it's specific for JVM programming languages such as Java, Scala, Kotlin, you name it and the Of course since the first release is natively integrated with the Doop and Apache Spark So you need it means that you don't have to reinvent the wheel if you need to do things with the Doop Spark It doesn't mean that you have to mandatory use a Doop I have used cases of the with the data in Amazon and Stree for example or You don't have to mandatory use Apache Spark if you want to do this You have very high level APIs. So the entry point is good And but if you run things on a machine with multiple GPUs you can still use the same framework and as I mentioned the same code can run on CPUs GPUs or a mixed environment without changing your code So you have just to change the configuration of your application and add the specific dependency for CPUs or GPUs This is how this works basically, you have data scientists working in Python with their Framework of choice in this case in this example is Keras It was the first one supported by the planning for J with any flavor of backend So they could use any other Python libraries if they want to use pandas numpy, whatever It's fine. All of the best practices. They know in Python or implementing models still apply What is different is that at the end when they are happy with the result They need to export the model in a serialized mode and commit is a put is under version control And then this is important into deep learning for J. We see in few minutes how this happens and Programmatically it means that from now Downstream everything happens into an environment, which is mostly based on things that run on the JVM So there is a chance that in production you have spark a dupe elastic search a solar stream set that I collect Kafka whatever all things implemented in skull or Java at the run on the JVM So your if you have a major continuous delivery pipeline on doing on the building testing and delivering things you can Still keep it and just having the pre-trained Python models or do transfer learning on top of your JVM Infrastructure with the minimal impact and a lot of automation you can put on top of this Deep learning for J provided later support also for tensorflow models on a combination of the two So if you use tensorflow the Keras API on top of tensorflow Another a couple of notes before entering the first example Deep learning for J is a modular framework So you don't need to import all of the modules there many in your project Just those that you really need I'm not going through details for this in the interest of time. Just I want to put the accent on the end if or J and if or J it was born as a Component from for the planning for J But now is some sort of so a standalone project is a library Which is which core has been implemented in C++ or CUDA depending if you want to use it on CPUs or GPUs but exposes the API is in scala so you can use also in in a Java project And this means basically this library has been created to fill the gap between Python and the JVM programming languages in terms of availability for those In terms of tools libraries for linear algebra or magic manipulations And this is pretty fast the syntax is very close to the one for NumPy if you're familiar with NumPy you could start using and if or J in scala or Java without problems and So we have three Different technologies that put together are a powerful combination each one brings something to this To the plate the first one is deep learning for J which has a very high level API So this is the perfect entry point for software engineers to understand things of deep learning and focusing on their particular Problem and not knowing anything about the math behind the data science Then you have a purchase park, you know in terms of performances for distributed calculation is the best If you do things both in batch of streaming fashion, but when it comes to training Multilayer neural networks. There is a problem I mentioned before about picking up one between performance and accuracy and This could be solved if you have something that doesn't run only on the JVM But at lower level very close to your hardware and this happens in deep learning for J through and if or J Because simplicity or exclusivity it always uses and if or J which is which core is implemented in C++ or QDA So most part of the things happen very close to your Processor and your hardware in your infrastructure. That's why this is a powerful combination and this is how the process happens As you can see the flow is pretty similar if you're using Keras or TensorFlow. So the data scientists Implement and train their models using their framework of choice And Keras or TensorFlow then they when they're happy they save their Serialized model in a specific format for that framework What is different is on the Java side? So that's why you see different icons Python and Java in this chart is the class that you have to use For to import the model is Keras model import for Keras and Tf graph mapper for TensorFlow and then the rest of the flow is this a similar for both You use the same API the same classes and then you load new data Pre-process them if needed and start to do inference if you're unhappy with the result on the when you run these things on your Java Scala side you can still do transfer learning over there and modify the model as well And we see now an example for Keras And Just for people familiar with Keras want to put this light to say that these are the typical Elements of the Keras framework the different categories At the present time the current implementation of deep learning 4j covers about 95% of this concept so so far I never encountered a case where My team had to go back to the design say oh probably you have to Change something because this is not supported by the planning for J So it's very comprehensive and and anyway the remaining 5% is going to be covered in the next month by the Framework maintainers that at the moment is the planning for J's maintained by the eclipse Foundation So this is how the it happens in with the code is a code example You in this case are to make team things simpler I'm importing one of the models available in the Keras do is a VGG 16 model is a convolutional neural network and I'm Importing the weights of the image net network Then in the code below I tested that everything is working fine When I'm happy with this the same you do if you train you the model yourself in python Then I have to Save the serialized model in this example. I split the configuration of the model and the weights on two different files but You know that in Keras you can put together in a single file Anyway, whatever your choice is deep learning 4j supports both So I have a json file for the configuration of the model and H5 f5 file for the weights Then I put this under version control and then on the software engineering size There is an application in this case is in Scala, but it's the same with the Java pretty same you import the model configuration and the weights and then use the Keras import Model import class to import this model into your application the third argument for this method Which is set to false. It means that if it's false in case you do transfer learning The process shouldn't change the original weights of the model if you put this to true It means that if I do transfer learning I want to then release a punta under version control a new model with updated weights but doing this on the Java front or Scala front and Then I load other images because this is an example for image classification Transform them in a multi-dimensional array because the model expect this and start to do inference in this case using the output method and and then you can also decode the output of your prediction to make It in a more human readable format for some of this pretend model Also for the models that are part of the deep learning for JZU. There are Facilities utilities to automatically decode the result. Otherwise, you have to write your few lines of code To show the result to the your final users I then finish this this process I can save The serialized model from deep learning for J in a specific format in a zip file and then this file could be distributed across multiple Java or Scala application or Kotlin whatever language you use and in this example that has been taken from my book Basically, I created a very stupid micro web application that uses that model I use the framework called spark Java to implement microservices and micro Micro web application, but you can use any other Framework available for the JVM languages. I pick up an image from the web That's not my car and I uploaded the image to this fancy UI I'm not a web developer and then the system is telling me that for it almost 77% this is a sport car and other Potential result that are meaningful for the particular image that has been uploaded if you're not happy anyway Say, okay, 77% is not a good result to me. So I would need probably to Pick up and prepare another data set and retrain the model and but if you pick up probably a huge data set I want to do things in spark you go back to your Java code and I'm skipping for a moment these two slides and Basically, there are very few classes You have to be familiar with if you want to do the training on spark in the planning for J The first one is the training master. So this is basically the Implementation you're going to use of the let's say the component that will do the training This is the training for you on top of an existing Apache Spark cluster and there are at moment two different implementation for for this going back to this light This is what happens behind the scene is the parameter averaging and the asynchronous stochastic this ingredient The good news is that you don't have to know what was happening at low level here because The high-level API allows you just to use the deep learning for JPI and forget about these details So if you are curious enough and you want to understand what's going on In your training you can go back But anyway, you can demand this activity probably to the guys that maintain the infrastructure or to Pull metrics to understand the performances of your application then After the training master there are two classes available for to represent your model in a distributed environment as the spark DL4j multi-layer and spark computation graph which are wrappers of the multi-layer network and the computation graph classes for From the same framework. Whatever is your network implementation? You will always use one of those two classes So if you're doing something with a particular CNN is a image net or a specific custom CNN or is a LSTM or some other neural network you always use those classes So this is a very very powerful high-level framework because you don't have to reinvent the wheel and use a specific class and Understand which one to use for your custom model The last class that you have to Consider they are the RDD of data set and multi-data set by the way for people familiar with spark Dataset here is not the data set from the Apache spark API is the data set interface for data structures for From and if or j that's why whatever you do you're going to use implicitly or explicitly RDD or data set or multi-data set and because this is implemented in a different J It means that this is running in C++ or CUDA and very close to the hardware of your notes That's why this becomes a very very powerful so probably someone has Asking set why are the deer not that the frame. I saw some faces so if you think of the deep learning the learning is don't to address some Nonlinear problem. So there is a chance that you are dealing with the structure data could be images videos audios or some other structure data coming from medical devices for example So in order to pre-process this data, you don't need In the final format a tabular representation at the end you need tensors So data frames in spark comes with an extra structure on top of RDD Specific to put data in tabular format, which will make the performance is really bad for if you do this in deep learning But typically you will need it Don't remember if they put support for them, but in other spark application. That's true I suggest not use RDD use that frames but in this case This is optimized because with RDD you're running things very close to your GPUs or CPUs That's that's the reason and in order to retrain the model basically going back to your code if you see there is a I'm getting the configuration that has been imported into the Scala object and set up the spark context and configuration and then in this case, I'm Using a parameter averaging training master to train these things then I create an instance of the network for spark using the spark context the model configuration imported from Python and The training master and then start to do training and evaluation with the deep learning for JPI This case I cut the code because was a little bit longer But anyway, there are specific fit methods for your data in this case the training data images I skipped also the part for Pre-processing them because that requires another 20 minutes of talk just for for that topic And this is the this process is repeatable. So it means whatever is your problem. You're writing the same code so at the end We ended up into creating some templates and Having this as part of our automated process anytime there was a new com it of a new model in our github enterprise there were Automatically update to these classes who created a new application for that specific problems and different templates on depending also on different training data if there are images or Other type of images. Anyway, the support is in deep learning for J So the data back model allows you to transform everything into tensors whatever the input is if there are images documents We have also to process a lot of PDF for example, so binary contents You can still use that a back and then do the same things With the same API The planning for J comes also with some visual facilities. This is an example of one of the Is the entry point of the UI you have when you train your models There are lots of pages with a lot of charts really useful also for data scientists to understand how the train Training is going. So if you spot something that to you is not as expected you could still stop the training fix things could be in configuration or somewhere else and Do the training again? so in this case you will save time because this information come in real time and Among the pages available in this application. There is also possibility to check for the resource usages I took this snapshot to where you can see for this is a local spark cluster in a single for a single node Running on a CPU. You can see on the chart on the top. There is there are two graphs there. There is a Red line which is pretty much Constance in time which represent the Utilization of JVM memory during this training while the blue line represent utilization of an off-heap memory For the same training in the same interval as you can see this is going up and down This is a case where the training is going is going very well But I deliberately put a bad memory workspace configuration so that you can see the Frugation of utilization of off-heap memory why this happens if you recall that the Implementation of in D4j, you know that this has been implemented in a C++ and Q that but runs on the JVM So most part of your object you have to expect to have them in the off-heap memory So this is the main difference on Implementing the spark application for this training compared to traditional spark or javascal application So you have to pay a little more attention on this but the rest of the practice at best practice you have in Coding or a monitoring application has still valid. So this is The only thing which is really new which is a little bit more unusual for a javascal developer and This will require another hour or couple of hours to talk about it But just this is good entry point if you start doing things this way take care of the off-heap memory And in some cases basically you can also reduce the amount of memory reserved for For the hip memory and use much more for the off-heap. So that's a good strategy to start with I'm collected all of this information. I know there is a probably have a lot of questions and other curiosities I collected this in my book which is going very well on torrent as well So people are downloaded this But anyway, it's a it's a good recognition. So people like the book and it's a that's been ranked in the top seven purchase park books for 2019 is I was surprised because is that the fourth place and sixth place? There is one by my day Matei Zacharia, which is the guy with a creative spark. So I don't know people putting more More a faith in my book. That's not that's good And anyway, I forgot to add here a link to my github repository There are a couple of examples one is a training with a different kind of network So there is a a Python example in Keras I found on github and then I adapted a little bit and then implemented them Java code to do the same it's a case of LSTM and then there is an example on how to Import the TensorFlow model that should work also with TensorFlow 2.2 some sort of wrapper that I implemented to working with the Streams guys to add an extra component on the streams of data collector, but you can play with it and importing in your in your application and see how it works with your specific model in in running them on the GVM. So this is all I have for you today and I'm open for question feel free to contact me here around I'll be around up to 5 30 p.m. 6 feel free to join me and link it in on Twitter through my blog and I would like to have also feedback from people I did a similar talk one month enough in Moscow and are receiving a lot of feedback and People that asking for also some help support on how to do things this way or that that other problems Using different frameworks or different things So I'm I'm trying to collect all these use cases and as soon as we be done with traveling with conferences in my job I will put all these things in my blog across 2020. So let's stay tuned and Also understand the evolution or alternatives to do this because 2017. This was the only variable solution also at the CERN engineer they came up with a similar solution But now there are new frameworks of technologies coming up in particular if you have availability of GPUs there is there will be some valid alternatives Thank you again. I wish us. Gracias for coming So any questions there are a couple of people there So this lights are blinding. So I hope I can see your face as well in in my company we year ago we were debating about using The planning for J. Yes But it was all for about big DL From Intel, you know, yeah, no opinion about that. Yeah, I forgot to mention that when we started this in 2017 Did the comparison between three different framework for the planning for a Scala? The planning for J big DL from Intel another one from thought works, which name was Scala DL of something similar and we ended up with this one because the big DL is It's good in performances if you have Intel hardware if you switch to other Hardware which was our case. It was really bad and another reason was that in terms of importing the Python models It's a little bit tricky. So in this case, basically you have the Data scientists using their libraries of choice and then just Version the serialized model while there they have to learn about big DL in this case They forget about the planning for J and the other one we discarded from thought works What was pretty new not yet ready to production and also Implemented by data scientists for data scientists. This has been born for a software engineer So people that have to put this in production didn't have any clue about Credence derivatives or whatever a matrix is behind this Thank you for an interesting presentation. Thank you. How can deep learning for J be affected or Improved from project hydrogen of spark. Oh Going but yeah, this is probably when a spark 3.0 We've come up to light. So there is that moment. No deadline. I talked with the olden karau with Tom Graves. They don't know yet That probably would be the case that some of these things could become obsolete one thing that probably we still valid is about the Import of the Python models. So probably the deep learning for J API themselves if you want to do everything for scratch in java scala In time could be something that could be discarded. I also talk with holding karau on a possibility of Importing this API in a spark after the in spark 3.0. Don't know How this is the conversation, but that's a good point So this is it would be merged in spark. Yeah, I did this proposal But I don't know about anyway with the we're waiting the problem is at the moment. There is no deadline for spark 3.0 I know also what has been implemented there But no one knows so if you have something production now or have to do this in the last three years There is no body alternative. Thanks. Thank you No, okay running out of time So thank you, but feel free to reach me. Thank you