 Cool. Hi guys. Hi. I'm Sharon. I'm working as a senior developer artificial intelligence in co-works So today I'm going to talk about pytorch on the basis of why did I switch? So we used to use tensorflow before my team still uses tensorflow and I'm the only one took the risk of moving to pytorch and I'm trying to convince them and I'm trying to convince you guys Okay Yeah, so I work in co-works again, and we have a small research team We called a college as co-works labs. So we do research on Middle-out vectors dot good hub.io. We can go there and see what it is. It's in the area of word embeddings So I Go by a second internet. That's a whole another story So wherever I am in you can go and find me as HH second Cool Okay, let's start. This is a gist of my talk So when I started preparing the talk I thought the major difference of pytorch from all of the popular deep learning framework is this dynamic capabilities Though I thought so I thought I should start with what is a computational graph and what are the different computational graphs, right? so We're gonna go through that and we I'm gonna give a general introduction to pytorch that you guys will get comfortable with The advance of the the design amount of coding that I've put across in the next session So the next session is where the actual talk starts why I switch and why why should you switch and then I have some bonus points Only if I have time to explain So if you guys want me to take the bonus point be alert and be attentive or at least give me that face Okay, cool Okay, so as you guys probably are aware computational graphs or The what we are doing with deep learning is trying to optimize Some mathematical function some Jane mathematical function and we're trying to find the parameters Which could actually solve this function and give you a result, right? So we are trying to find a function for any issue or for any problem that we find in universe And we are setting up a deep learning network and we are trying to optimize the parameters. That's all about it So it's all math, right? You can just do it by hand The question is do you want to do it by hand? So this is a small snippet that I took from the original LSTM paper of 1997 so this is the complexity of mathematical equation that you will have to solve to get the the the solution of the function that you're trying to solve and Probably the function so this is like a just just a small snippet from the LSTM paper So there should be a better way to do it, right? So that's how people come up with this graph approach. You picturize the whole Function that you have as a graph and You can do a lot more or at least it is in duty for you, right? When you look at that graph you exactly know how the whole flow happens much better than how you look into that equation So it's a hyperbolic tangent on two metrics multiple addition of two metrics multiplication results Right. So this is extremely useful when you start dealing with more complex graph and the complexity exponentiate as you go as you start building a really complex Really big graph. So this is another computational graph, which is a representation of Google inception model that has around a hundred layers and There is no way it's almost impossible for you to do it in by hand or by program if you don't have this approach of computational graph So you implement that by computational graph then and the advantage of having this graph Representation is you can traverse through this graph whenever you want to wear wherever you want And you can take the gradient along with you and update whatever weight you want or whatever parameters you want And you could actually use that for getting a better solution Okay, so right now I'm done with the computational graph and we're gonna start discuss about two approaches that people have taken towards this Computational graph the first one is the static approach, which is this traditional way of approaching Computational graph so MX net tensorflow piano all those popular deep learning frameworks have implemented based on the static graph approach So in static graph what you are essentially doing is computer compute your graph beforehand and keep it with you Basically the framework tries to optimize the graph definition that you have written and keep that in the memory And you loop through your data and feed this data to your graph, right? So the the only thing that you guys should be looking at is where the loop happening You are looping through your data and the graph definition is outside that loop right so The difference in dynamic graph is you're not defining your graph beforehand You are defining your graph the graph definition is inside the loop where you iterate through your data So pytorch is not the first framework that implements dynamic graphs. We had Autograph from how what almost four years back and we have got chainer We have got dinet, but the problem was all all this graph dynamic graph frameworks were concentrating on Research so what pytorch did is they took the front end of chainer and They merged that with the back end of torch, which is a well-known well-tested and superfast framework And you have a GPU optimized graph That has dynamic capabilities through pytorch Okay, so I have a small example here, which is doing the XOR operation in pytorch and tensorflow So I don't want you guys to look the other part of the code the only part that I want you to look is this The loop so first we look in the tensorflow. So if you look at the first line We are adding the bias and the result from the matrix multiplication So when when you add those two variables those two variables does not have data in it, right? It's just the placeholders. So there's no mathematics happening there It's just a placeholder and your tensorflow is trying to make a graph definition in the back end and keeping it in the memory. So So that that square in tensorflow is the graph definition, which is outside your Loop and now you are looping through your data here so basically it's looping through the epoch and trying to Feed the data into that graph right, so the graph building happening only once and in the case of pytorch the graph definition the all metrics multiplication everything is happening inside that loop and You have data in all those variables So the dot mm of w1 you have data in X Do you have data in w1 and the computation happening when the interpreter reaches that line, right? It is not waiting for the data to be fed in the future Okay, so let's Do a getting started with pytorch Okay, so this is the homepage of pytorch and it's they have a decent UI You have a couple of buttons there. You select your option linux windows I don't think when they will support windows but oils and other options and you get a command You spit in your paste in your terminal and you are ready to go with pytorch You import torch. You don't import pytorch and So when the core developers started thinking about pytorch They want it to be as similar to Numbai so that a beginner who started coding in Numbai should be easily able to migrate to pytorch So this is how you create a random matrix. You have another option tensor and You take extort size for getting the shape which is actually An object which is inherited from python's tuples so you can do all the operations that you could do on python tuple and you can index you can do slicing just like how You do on python list on all the dimensions So I'm doing I'm doing some slicing on the second dimension there Okay, so another important factor about pytorch is you have this Numbai bridge And if your data set is stored in as a Numbai array You can just convert it to a pytorch tensor by calling from underscore Numbai function Or you can convert a pytorch tensor to Numbai by calling extort Numbai you'll get a Numbai array back Okay So another important difference between tensorflow and pytorch is if you want to do a GPU operation tensorflow You will have to install GPU version of tensorflow, right? But pytorch does not come as a two version. Let's have only one version and If you so when you do an x plus y the operation is happening on your CPU and the moment you want to convert back to GPU You don't you call extort CUDA and you get a GPU tensor back So if you look at here, so the same operation I'm doing here, but it's happening on GPU Right, so I have often see seen this Approach in pytorch code where people create CPU tensors and do their CPU optimized operation in CPU and The moment they need the parallelization when the moment they need The GPU optimization they convert it to CUDA and do the operation there And if they need that tensor back to CPU they can convert it back to CPU by calling dot CPU method Which is really handy, especially if you're worried about the CPU optimization and GPU optimization issues Okay, so this is really important This is basically the backbone of pytorch the autograd package which is actually responsible for doing the automatic differentiation for you and Autograd has a module called variable that has data grad and create rennet so data is Intuitive so it's it has the data in it when you want your number to be an integer a python integer You just call data and you'll get that that data the data python object back and you left your grad gradient in the dot grad object so when you do a backward pass all the data is all the gradient is gonna be accumulated in dot grad object and Pytos does not do Pytos does not update your data Automatically, it'll wait for you. It'll just acclimate the gradient in dot grad object and you are responsible for updating your weight Yeah updating your parameters So dot creator is probably the most interesting thing among these three So this is how pytorch keep the keeps the whole graph together, right? so any node will have its creator in dot creator object and You can traverse from one node to any node by calling dot creator and dot creator on the dot the creator of that particular object Yeah, so I okay, so the next object in pytorch autograd is functional module. It has addition So when you do an X plus Y The plus is the functional module plus is part of the functional module or when you do a tannage operation the tannage is part of the Functional module so basically the dot create project that I was talking about is pointing to a functional object because when you do When you take two tensors and you an addition you get it another tensor which actually created by this addition operation Okay, so this is the order the whole the parent autograd package So as per the website says this is the central to all neural networks in pytorch and you can you it has all this automatic differentiation Methods in it. Okay, so this is a typical example given by the pytorch website to show the dynamic capabilities So when the interpreter is at the the last line w underscore X is equal to variable It has data in it and that only the only the nodes created in the graph is that though those four nodes Right by torches have no clue about what is going to come next what should be the what would be the graph structure in the future And when the interpreter reaches this two line the matrix multiplication it creates that then a little addition and then do the tannage So as the interpreter reach or eat reaches each line That's when pytorch creates your graph and when you do a dot backward You do the backward pass so in the next iteration the graph structure could be different doesn't really matter It in each iteration it it you can actually Dynamically create any graph structure you want and when you do the backward pass Backward through the graph you created in that iteration, right? cool Yeah, I have The Jupiter notebook with some coordinate so Yeah, X is a variable a pytorch variable and I'm trying to print extort grad and extort data there so grad does not have any value in it because I haven't done my backward pass and Extort data is 12.3, which is a Floating point number, okay, so I'm doing the functional operation there So I'm trying to take the power and trying to multiply and do a subtraction and then passing that to the hyperbolic tangent And then I'm doing the backward right so when I execute that the graph created by to see the variable is being back propagated and Now why when I try to print the same thing that I try to print out on the second line I have the gradient in extort grad it's zero because there is no gradient and The data is same so it would not update the data when you do the backward pass Cool, so this is probably the interesting part So you can try printing the creator of Z and it says the tannous function and From the same Z you can try to print a Z dot created or previous function and give you the multiplication operator You can traverse from C so then the third line I'm trying to print the creator of creator of creator of C So from the C from the last node I can traverse back to any node I want by calling the dot creator function dot creator object Cool, this is just a pictorial representation of the same. So I have this x variable There's a function f and there is a function g so all this is connected by the creator object that you have Okay, cool, so I am done with the introduction now. We start the actual talk Why why did I switch or why should you switch? So basically I'm not I'm not saying Pytorch is better than TensorFlow in any means But what I'm trying to say is there are there were specific reasons which fight the TensorFlow did not work for me Well and Pytorch did so if you guys have the same reason and if you guys are worried about the same reason The same thing in your day-to-day job. You should probably To take the decision that I did Okay, so this are a couple of tweets that I took from last month. I guess So it has I guess a couple of funny tweets Okay, you guys can read it later Okay, so this are the four reasons that I want to put across to you to convince you That you should switch they go to abstraction, right? So tors dot f has all the functional module tors dot where autograd has all the things that related to automatic differentiation and They have got their high level module just like Keras So Keras of TensorFlow is tors dot nn So you have all the high level layers in tors dot nn you have the sequential module You have your linear layer. You have your relu layer. You can just call it by tors dot nn and dot the layer name Okay, so I want you guys to do me a favor, right? So this is a network that does the reduction and I want you guys to remember remember this network is doing production So basically what it's doing is it's taking two words the embedding of two words at a time And squeeze it together and spit out another embedding a single embedding for two words So the reduction network is for taking two words Do the reduction and spit out a single embedding outside so the one of the Easy part of or one of the the beauty of PyTorch abstraction the nn module is you can inherit from the nn dot module And you can just create any layer that you want by using the layers layers that has defined already So in the init i'm trying to initialize nn dot linear uh with the size of the Of the number of neurons and then in the forward pass i'm taking the left Leftward i'm taking the rightward and passing that to the tree lstm and getting the output embedding back cool, so Okay, so the this is probably the longest reason The or probably the strongest reason it supports dynamic graphs, right? So uh all other popular framework as i said supports static graphs And the dynamic Dynamic capability of a graph has a lot of potential. It's super powerful and i'm going to show you how it is powerful Okay, so these are the three major limitation that i found with static graphs If if your neuronal network structure is dependent on the data It's super difficult for you to implement that in a static graph, right? So you're defining your graph outside and when you iterate your data only when you iterate through your data You get the structure of your input data And if you want your if you want to build a neural network that has a structure depends on the data It's extremely difficult to implement in a static graph But since in PyTorch it's happening dynamically you can do it on the fly you can create your graph on the fly And a typical example would be a sequence data. So you will have to have a sequence unit for each word in your data and Actually all the static graph framework has worked around like tf.y loop or tf.dynamic rnn But but PyTorch offers you can use the language primitives or language constructs And you can implement it by your own you can just put a for loop inside your forward function And you can have the dynamic capability you don't have to learn A new construct like tf.y loop to implement a dynamic capability in your graph Okay, the next point is it's also really important, especially in the field of reinforcement learning in natural language processing as well so If your input if your neural network structure is depends on the computation happening inside your network There is no way other than you do some hacks a static graph could implement that right? I have a super good example for this too And the third point is if you are into a research and if you want to change the gradient or weight on the fly It's very difficult to do that in a static graph, but it's supernatural and intuitive in a dynamic graph like PyTorch so basically The after all the reason why the deep learning framework exists Is to make the life of a developer easy right? But when you start approaching a new problem when you start conceptualizing a new problem or prototyping something What static graphs does is it restricts the way you think you have to be in this Box that has a lot of constraints. You have to be inside and think how you could approach that particular problem But uh dynamic graph is actually opening up that box for you And so this is the example that I was talking about This is the this is the research done in natural stand for natural language inference group so Basically it it tries to address the two problems that I was talking about By by implementing it in dynamic graph So this particular problem has neural network structure depends on the input data And neural network structure Depends on the computation happening inside the network. So the problem is this Uh, you have a sentence. Okay. I'm not going to define the whole problem. I just define what we need here. Uh, so the The problem is to do sentence classification. You get a sentence and you do classification on that sentence Right the the way they approach the sentence classification problem is what make dynamic graphs implementation is much more easier than Static graph implementation So I have a typical example here. The church has cracks in the ceiling. So they have 500 k sentences that actually parsed and put it as a parse tree So basically it's parsed based on the phrases. Uh, the church is a Verbal phrase, I guess no, it's a known phrase has cracks in the ceiling is a verb phrase and In the ceiling is a prepositional phase. So you have this parsed data in with you and you are You are approaching this in another way. You put this another List with you that has the operation that you should do at any point of time s means shift and r means radio so radio is what we have seen before the reduction network and Basically how we are trying to approach this problem is when the parser List says it is s we try to push One word from that buffer of input sentence to a stack when it says reduce We pop two words from the stack and do the reduction and push it back to the stack Okay, I have a representation for that So this is the transition list that you have that has operations in it and that's the buffer you have that has the word in It and you have a stack So when it says s you shift the word So we have one more as you shift the next word and then it says r So what you are going to do is you pop these two words from the stack do the reduction it'll pass through that reduce network that I have defined and Push that sentence embedding back to or the embedding that you got back to the stack So that's reduce and we have a couple of shift And then we do a couple of hours So it'll always pop from the the top two and do the reduction and we have one more shift But that's a full stop and two more reduction So after this process you get the sentence embedding as this is what your sentence embedding is right So the problem that I was talking about is this the sentence length or this the length of transition list can be varied right and that's Actually what is difficult to implement in a static graph because it's the the length of that list is entirely depends on the input data that you have And you have an if else case there if the transition list says it is a yes You push that word to the stack. You're not doing any uh neural network operation But if that is an r you pass that you pop that two words and push that to uh and pass that to the reduced network that you have So there's a neural network operation. You have to build the graph if it is an r So you have this if else condition check in your network, uh, and which I don't even I don't know even tensor flow fold can implement the if else problem Uh, it can it can address the dynamic uh the sequence length problem Okay, so this is the sentence classifier the abstract. I I don't want you guys to look any of uh the chord other than the 66th line Which is where the particular problem that I was uh Saying is gonna be in the spin module So the left hand side is a spin network and the right hand side is my training chord So, uh, like I said, it's so in Uh, the 14th line it's trying to loop through that transition which could be valid for each data And in this box, it's trying to screwing that if else check right if there is a radius It has to create the graph or the reduction graph and the and do the reduction operation Okay, in the right hand side, I have the training Loop, so you have the optin module in torch that has all the optimization algorithm like rmsprop or add them or add adult or Uh, so and so and you define that and you pass your model's parameter to it So basically all the model parameter is a variable that has data and that has to grad in it So the optin package got all the variable that has grad and data in it and when it do the backward It calculates the grad and when you do up dot step optin dot step It takes the value from the grad and update your data really simple Cool, okay, so this is the third point Uh, easy debugging. So if you guys have used tensorflow before I'm pretty sure you guys have filled the fill the pain Of debugging in tensorflow, right? It's really it's really a mess. So you so the problem is when you when you put your When when you when when your chord breaks when the when your neural network breaks What are you going to get is thus the graph definition that you have written tensorflow has created a execution engine based out of that So when your chord breaks, what are you going to get back is a trace back from A set of chord which is not written by you and which is Which is probably you're not going to understand easily because it's really complex And you have to scroll through three or four pages of terminal error and error to find where exactly the error occurs Right. It's really painful But since pytorch is imperative You exactly if you can use maybe a breakpoint or some print in python and find where exactly your chord breaks And you can fix it Who the last point is control over the data and grad so I was talking to one of my friend Especially about this pytorch numbi conversion and he said me I've never been so close to the weights Or the gradients. So that's exactly the thing, right? You you you have access to your data or the gradient at any point of time in your graph Let it be the in the for loop. Let it be in mid of your training You don't have to call the session and return return back your data from the session to Understand what exactly happening in a particular part. You can just print that particular variable and you get it There in your terminal Okay, looks like I do have time for the bonus so pytorch actually created as a research framework, but what facebook offered is They They will build this intro between pytorch and cafe to real soon And once that is done, it is like tensorflow and tf serve So you can once you you build your your prototype your model in pytorch and you migrate that to cafe to And then you are ready to go in any devices because cafe to is optimized for mobile device as well And it has a pretty good Serving module as per facebook Okay, so pytorch another thing that pytorch lacks is the visualization option But pytorch does not have any visualization options yet officially, but people have tried different approaches to visualize the network And what pytorch core developers thought is tensorboard is amazing You don't have to build something again. We can just use tensorboard And there will be an official integration between pytorch and tensorboard real soon Okay, so this is something that I've used in an encoder decoder network that I have created So you will have access to all the code that I have used in the ppt in github repo So, uh, this is actually trying to unroll all the Sequence in my Encoder decoder network or you can actually print the decoder object or so I'm trying to print the decoder object here So the return of that forward the class that you have defined that you are inherited from nn.mode You will just print it and you will get the structure of Your object Okay, so this is I have used This guy's github repo. I cannot read that name. Uh, so So The code is in github gist. So it's again the error And cost of the encoder decoder network that I was talking about So this is the representation in tensorboard which took the loss and The information from pytorch and brought it for you So there is another guy called lanpa. So he has done a pretty amazing job creating Probably all the features of tensorboard integration with pytorch. You can go and check it out Okay, so I was actually planning to do benchmarking But when I started when I googled it, I thought I shouldn't be doing this because I've I've started using pytos maybe three last two months and I've never felt pytorch is lower than tensorflow in any point of time And I've never seen in google any anybody saying pytorch is lower. In fact tensorflow's github issues list has almost five issues says tensorflow is lower than pytorch. So I'm happy Um, okay, so these are the links that you could go and find the couple of benchmarks done by different people Okay, so I think I'm done Uh, oh nine minutes Okay, so this is the qr code you could scan to get access to all the links that I have And yeah, so you can connect me By using it just second my tag Cool, uh, I'm open for some questions Hi, uh, so, uh, I've been facing the similar issues in tensor for tensorflow and I shifted to pytorch recently Okay, and I think pytorch is really good But are there any loopholes for pytorch during production because the pytorch developers have said that they built pytorch for research Whereas tensorflow is built for production But I have not faced any issues for pytorch during production. So have you faced any? I haven't either. So I am using so the face recognition that we have built Uh was in tf some part of that in tf So we migrate the whole code to pytorch and that's how I started and it's running in production. We don't have any issues yet Okay, secondly, um Are the the memory optimizations which mx net has proposed like a recomputations of the relu and uh, uh batch normalization Is it uh available in pytorch does not have that so probably that's Those are a couple of the reasons why they are saying it's for research because there are no, uh optimization like the extra thing That pytorch core does for you All right. Thanks. Thanks guys