 Now we'll have Akshay Bahadur on the stage. Just a quick word about Akshay though. Hey guys. So Akshay, I'll just quickly introduce you and you'll be on. Sure. So Akshay's interest in computer science path when he was working on women's safety application aimed towards the women's welfare in India and since then he's been consistently tackling social issues in India through technology. He's currently working alongside Google to make an Indian sign language recognition system specifically aimed at running on lower source environments for developing countries. His ambition is to make valuable contributions towards the ML community and leave a message of perseverance and tenacity. He's one of the eight Google developers experts from India along with being one of one of the team members worldwide for the Intel software innovator program. I mean there are a lot of other achievements here but I'll let Akshay stop speaking for himself and Akshay, the stage is yours. Alright, thank you so much. Thank you so much for joining. Let me quickly share my screen. Alright, so today we'll be talking about resource utilization and metric for machine learning and this is going to be a 25 minute session so we'll go into some, we'll go to depth in some concepts but I'm going to leave some, I mean some room open for you guys to explore on your own. So before we start a big thanks to Bicon India Community for organizing such a huge event and the working has been really smooth so a big congrats to them. Okay, so before we start just a bit about myself so these are some of the points that I have written down so I have been awarded the most information and Young Data Scientist of the Year award for my contributions. At the same time as already mentioned I'm also collaborating with Google on my Indian Sign Language Organization project I got a chance to go to California to disturb my project to the TensorFlow team. I've also presented my research tutorial at IEEE Winter Conference and in general I've been invited for over 50 sessions and keynotes on machine learning so I can very well say that you are in good hands. Alright, so these are some of the snippets of me at some conferences this is all over India and all over the world as well where I just like to share my knowledge and share my love for machine learning with everyone. So an important link so if you want to follow along there is an interesting link so just go to bit.ly-slash-20 and once you do that you will be redirected to my GitHub page and that page contains links and references to all the topics that we'll be covering today so if you miss out on something and if you want to follow along a bit later you can do that just go to bit.ly-slash-20 and you'll be redirected to this page. So before we start a bit about machine learning machine learning is nothing but an ordinary system and with the help of AI is Artificial Intelligence and how we do that can vary with machine learning mostly than with data so with the help of machine learning the model will learn, it will predict and it will improve over time so this is the essence of machine learning we can convert any ordinary system using AI to make these valuable predictions which can make a difference in productivity and efficiency so before we start let's start with a very simple example I'm pretty sure everybody must have done this problem so this is M-N-I-S-C or this is classification so this is one of the very basic problems when we start with machine learning with image processing this is the most simple project that anyone goes for so let's talk about M-N-I-S-C a bit first so anybody who does not know that's still fine let's just explore what exactly it means so as you can see that we have already have something called M-N-I-S-C dot load data which comes out of box for Keras and also comes out of the box for most of the frameworks so you can just simply write a function showData which will give you or show you the amount of or show you the different images in your data set this is what you will get so basically given the image of the digit 3 and you will also have a label 3 so the whole task is to classify each of the photographs of these digits seems really simple let's see how we could do that before that let's try to analyze and see if we can use some simple techniques that we might have learned in mathematics or in statistics to see if we can improve our algorithm so let's print the shape so when you print the shape we get 28 by 28 which means that the images are of size of height 28 and width 28 and if you print out all the pixel values we get these 28 by 28 values and what we notice here very clearly is that the least value 0 and the maximum value is 255 which is the highest pixel intensity of any pixel so lowest will be 0 and highest will be 255 pointing to 0 and black respectively but what we also notice that this is a very sparse matrix that means it's scattered so between 0 to 255 you can see that this is highly spread out data which means that it's going to be difficult for this data to converge so we will take the help of normalization which is a very simple technique that we might have studied back in class 11 or 12 in statistics or mathematics and normalization is nothing but kind of giving an upper one and lower bound to your data so let's see how we do that so what you could do is a very simple technique you divide your data set by the highest value and in this case I am dividing my training set and my test set by 255 and once that happens the shape remains the same which means that they are still working with 28x28 images but now you can see that my spread of the data is less so the lowest value is 0 and the uppermost value is 1 so we are containing the spread of the data and this helps in converges which means that ideally or theoretically if we go with this logic our machine learning model will converge faster let's put that and this is a very simple concept as I already mentioned normalization is a very basic concept that I have studied in statistics perhaps in class 10th or 11th so let's see let's put that to use okay and then there is another technique that we could use so in this case in the previous case as you can see that the uppermost value is 1 and the lowest value is 0 which means that the average mean is going to be around 0.5 helps in converges so there is another technique that you could follow in which the mean could be 0 so what we do is we divide with the high half of the value and we subtract 1 from it so if you see we are dividing the clearing set by 127.5 which is half of 255 and then we are subtracting 1 from it so once this happens you can see that now my lowest value is minus 1 and my uppermost value becomes plus 1 so the mean of our data it now becomes 0 and mathematically it has improved that if your mean is centered around 0 your model will converge much better let's see if that works for us or not okay so now there is a couple of very simple exploratory data analysis I am seeing that I have 60,000 examples in my training set I have 10,000 examples in my test set this is the shape and the y label label is this 10 because we are going to classify digits from 0 to 9 which is equal to 10 digits and in this case we are using a simple simple network we are not even using a CNN network we are using a simple feed forward networks and let's see how it works for us okay so now we are going to work with non-normalized data so the data is unnormalized and we are running it for 2 epochs let's see how it looks so this is just the model definition you can see and maybe check it out a bit later and the number of roto-parameters are around 44,000 total trainable parameters now let's look at our training so what we see is that it takes on 60 microseconds per step which means 4 epochs it takes 68 microseconds and the loss is around 11 and the accuracy is around 26% so if you think logically even if we do a simple random calculation I still have a probability of 10% that means that I have total output classes to be 10 and out of that if I select randomly I will still have a probability of getting it right 1 out of 10 times which is around 10% so we are doing slightly better than that but not as good as I would have wanted my model to work and let's look at the normalized data set so this is the normalized data set where the mean is 0.5 so the lower value is 0 and the upper value is 1 so now it takes a slightly more time to train your microseconds per step but what we realize is that the accuracy is actually increased 293% so with the same technique everything is the same learning for 2 epochs we are using the same model the only thing that we have changed is the very simple concept of how we normalize our data set and after normalization you can see that our accuracy has increased from 26% 293% in almost the same time so this means that with a very simple technique called normalization which we have studied in mathematics we were able to increase our accuracy 4 times and this is something that we will continuously work on just to make sure that our model converges faster and we will be saving resources and similarly you can also think you can also see this data set where the mean is around 0 it performs almost equally well it takes a bit of less time but again these are only working on MNIAC data set so we cannot say for sure that this works best for us or not but we can clearly see that we normalize data set our model performs much better so just wanted to highlight that using very simple techniques of mathematics and statistics that you might have already studied we can make a huge difference in terms of how we train the model and how effectively we train the model and this is the accuracy and the loss graph you can see that on the right you have the data set which is normalized and you can see that the accuracy is continuously and it's a good sign now we will talk a bit about optimizers and optimizers are and the thing but they tell you how to optimize in the model look at this very simple example what you see here is that the model remains the same the only thing that I have changed is that I have added an optimum variable and I am going to use that to optimize it as a time of compilation to see how different optimizers work for the model so this is the same model from before just a normal feed power network and as you can see the number of variable parameters are also the same let's go forward let's talk about stochastic gradient descent stochastic gradient descent if you work on that you can see that the loss is around 1.3 we are working with normalized data set and the accuracy is around 54% so it works well after running power but the main point that I wanted to tell you is that it takes less time so stochastic gradient descent is mainly used in places where you don't have a lot of time per step and your model would like to be converged as quickly as possible if let's say I am working on training on any incoming stream of data I am going to use stochastic gradient descent because it works very fast but if I have a lot of time maybe I can use rms prop which is a much more optimized optimizer and it takes a bit of more time but as you can see the accuracy is 85% just out of one epoch so different techniques probably do the same thing it says that we have these two different optimizers which have different compatibility different use cases but based on your use case you can choose either of them and they will work really well for you so now let's talk about activation activation is nothing but giving your model non-linearity and I think you already must know a lot of these very popular active functions like failure and also we will look at that and we will discuss that quickly so now you can see that I have used another activation but I have used an activation function for both the dense layers and of course we are using the same model the number of total parameters is the same and the sigmoid as you can see that it takes less time but the accuracy is around 79% the relu or the linear unit the accuracy is around 86% so different activation functions will help you deal with different factors but the problem with relu is that which I have not discussed in this slide is that it is very prone to gradient vanishing problems so you might even want to check out leaky relu if that works for your use case or not but as you can clearly see sigmoid works well it takes less time relu works better in terms of your accuracy but it takes slightly more time because it has to do a little more calculations let's talk about learning rate tk and how effective it is so with kerach you can do a simple on batch and you can do a callback and nothing if you just look at this function it does nothing but it divides the learning rate by the total number of iterations as the iterations increase my learning rate will go down and this is nothing but a very simple example of learning rate decay and then let's say if you use this function and if you see that at callback I am calling this add-in learning rate tracker let's see how it works well with our training so now you can see that on the first epoch my learning rate is around 0.0001 and as the iteration goes forward it goes down to 0.0003 and then of course it goes down to a much less value and you can see that accuracy works really well with same configurations if you are using these very small techniques which we are aware of and if you use it effectively it is going to help us to build a more efficient model on how exactly to save resources and build a very efficient machine learning model now we will discuss about I just want to show you an example of how exactly you can integrate machine learning with other forms of techniques as well so we look at this video so this is machine learning and image processing I have used so that I can do a very simple real-time calculation or real-time prediction so this is nothing but an image processing or a computer vision application that I have written and it gets the alphabets from the screen and it performs these three regressions logistic regression or it is a shallow network or a deep network and you can see that a deep network performs well in most of these cases so I just want to give you an example that machine learning can be integrated with other technologies to build a product that is more that performs that is communicated to different tasks similarly I have developed something called a Digi encoder and all of these projects are available on my github if you want to check that out so what it does is that I have first trained my model to memorize these numbers and then based on these numbers my model is now trying to draw it from its own memory so you can see these three windows to the left I have a shallow network, a deep network and a CNN network and as you can see that because CNN works well with images, my CNN network has a better memory or better memory as in it is more intelligent so you can see that it draws the digit 8 or the digit 4 in this case with better precision so this is one of the very simple techniques that you could follow along and this might solve some use cases for you and mostly it is very good for your learning process so this is based on a simple concept of encoder decoders in machine learning so let's move forward now we will talk about CIFAR 10 data set so let's look at that so very simple to what we have seen earlier on MNIST we will be working on CIFAR 10 for a couple of minutes so what I wanted to share is the network architecture also plays an important role so up until now we have been only working with very simple networks or networks which have you know which are feed forward or MLP kind of networks very simple data analysis again now the important thing to notice here is that now we are dealing with colored images and if you see the image on the right, I think it is a frog but it is very difficult for a human to even get this information so it is going to be extremely difficult for a machine learning algorithm to do the same right so now we are working with image data and this is going to be the same architecture that we have seen before from the feed forward networks and if we do a model outfit on this network it is going to definitely and of course we have added another layer of dense of 1024 units but as you can see that it does not perform well so you can see that the accuracy is almost the same which means that my model is not learning and turn percent accuracy means that it is a random prediction based on 10 images based on 10 classes if I select one randomly I will still have a probability of getting inside it 10% of the time so this does not work well for me I am taking a lot of time as well and as you can see the accuracy goes down at first increases and goes on which means that my model is not learning at all so now what you could do is when we are working with different use cases we can actually change our network architecture based on the needs so in this case I can use CNN network CNN is very effective when it comes to dealing with images and how we do that is kind of beyond the scope but I will just give you a very brief idea when working with images CNN network or CNN layers can learn a specific feature from the image and these features then translate and therefore CNN networks are very effective when it comes to dealing with images or videos so you have a con layer and of course you have different max pooling and other layers as well which help the con layer to learn effectively now the total number of parameters are less for us and at the same time the accuracy is 36% which is not as good but still it's better than 10% which means that we can still change our network architecture a bit to make it much better but what I wanted to show you is that it takes less time if you are working with a CNN layer if you have good efficient techniques in between like max pooling and drop out of some of these techniques if you can use it, it's going to be really helpful and it's going to be efficient to learn for a longer period of time as you can see the accuracy is increasing over time which means that we are still learning and it's a good sign so we have definitely improved our model based on our information of different networks now we are talking about this Research Shaper by NVIDIA it's called as N20 Deep Learning for Self-Driving Cars I think this came out in 2017 I believe and if you see that the Research Shaper by NVIDIA that it does excise and this is the data so basically given the image of the road as a label you are getting given the steering angle and so these are different images from the data set itself and this is the model which is given in the paper in the Research Shaper that was shared this is the model architecture so what we see here is that the total amount of parameters around is around 20000 and the total size of pixel of per image is 66 by 200 by 3 which means that my width is 200, height is 66 and my 3 is the number of channels which means that it's a coloured image so let's say if we load this entire model or if we resize the image to 200 by 200 and if we load the entire data set in memory it is going to use around 8.7 GB of RAM and this is something that is very crucial because let's say if I am working on a personal laptop and if I don't have this machine in algorithm it's not going to work well for me so what I could do is I could use a simple concept called scaling which is converting my model to 250 size and it's going to then give me a RAM usage of 1.1 GB which is still okay but what I also want to know is that will it work well for me or not because I am losing information I have scaled down my images this means that I might lose information and the whole size of pixels are going down of course you can check all the parameters are also going down and if I train this model it's going to perform well the loss is going to be go down it's going to be around 0.14 and this is the time taken of course and then of course I can use a simple technique called as filtering which is basically converting my colour data to a different filter I am using HSP filter in this case HSP filter let's have a look at that so it basically converts my colour images to somehow black and white images but HSP is different because it gives a special importance to the different gradients or different saturation or the darkness in your image so what you can notice here is that the road is more darker than the other area so I worked on this concept that the road is going to be darker so HSP format will be able to extract the road a bit better so similarly if I use this technique I can do the same process in just 800 MB of RAM and if I train the model my road number of ammeters are now gone down from 132,000 to just 13,000 and I can get a total pixels of per image of 2500 and if I train this it's going to go to 0.15 and still learning as you can see the loss is still going down and at the same time you can even use fit generator function which means that I can load images at runtime so this is what happens I can use a number of different functions and as you can see the road number is 10 million but because I am extracting images at the runtime I am not really concerned about having all of that data in my RAM at one point so as you can see that my loss goes down exponentially it takes a lot of time, it takes on 2 seconds first step and the loss as you can see it goes on exponentially but the RAM is of course 12.8 GB of RAM so this is something that you might want to keep in mind and this is the simple auto pilot that I did and this is based on the HSP format which is the least resource intensive format that I just discussed it does not perform well in crowded areas as you can see but it still has a general idea of how this model will work and this is where the effective technique because now we have effectively gone down from 130,000 parameters to just 30,000 parameters and my model is still performing well as per the expectations so I am just going to go ahead because I have less time now so we talked about SLR which is in sign language which I am working on the problem statement is that around 40 million people in India have speech and hearing impairment and the other solution should be that there should be a standard ISL you can use AI and you can of course keep the resource intensity to the bare minimum this is the preparation part you can have a look at that I went to an NGO and learned sign language this is the data set that I prepared and as you can see that I am working with filtered images so not colored images just the skeleton images of my case and my hand and you can see that the whole video you can have a look at that a bit later as well so this is what the output looks like and I even got a chance to go to telephone with my work this is a small snippet from the video so you can have a look at the entire video in your free time so if you have any questions feel free to reach out to me you can reach out to me now both of these work you can go to my website and of course you can give me a feedback if you feel that there is some points that I lagged in this session or if you feel that I can improve on certain aspects feel free to reach out I will be more than happy to include those points in my presentation the next time so this is it this is Akshay Bahadur you can find the content at www.AkshayBahadur.com and hopefully I will see you next time I hope you guys are doing well okay thank you so much very well-timed talk but I don't think that you are a stranger to talks within time limits and we have one question at least on an online capacity this can be applied for continuous data but what are the techniques for NLP processing yeah so NLP processing is definitely quite interesting to be honest there are a lot of pre-processing is involved and to be honest NLP problems are more difficult it's more time consuming because with images you have a certain boundary that you know the images are going to be between 0 to 25 pixels but with data with text data you can have n number of different words so some of the techniques that you could follow are pretty simple so all the processing steps like stemming tokenization a lot of these basic functionalities are definitely going to be helpful at the same time what I also wanted to show you and I could not because of the time limit is that using pre-trained models is very important so I am working on some technique where we use BERT based vectors for learning itself and earlier when we were training the data from stats it was difficult for us but when we used BERT BERT embeddings our machine learning model was performing much much better and we were able to get in critical information that was you know beyond our own analysis so yeah so use pre-training is also pretty important so all of these techniques you can use an amalgamation of all of these techniques to get a better idea of how you want to solve the problem thank you can you suggest good books for computer vision do you have any recommendations for resources, blogs so computer vision definitely I think PyMet search is very good PyMet search blog by Adrian and one of my friends Shai Paul is also working there one of the great blogs apart from that you can see OpenCV documentation and then you also have OpenCV or by Satya Malik who has a very good course as well as books on OpenCV what I usually do is I refer to PyMet search I refer to documentation and there is another Udacity down degree in computer vision so I hope that answers your question sure I think it will be a lot better for people to copy past if you also put them in the Zulip chat I will be starting a thread for you to pass it and if you have slides already available if you can provide a link that will also be great sure okay we are exactly at the 620 mark which was like perfect time so and we are so used to going over time that I am almost looking for any more questions but there aren't right now so if actually we will be able to be reached on the Zulip channel for at least a couple more minutes I am assuming that yeah definitely you can reach out to me any questions you can even ping me on my website or github so just wanted to reiterate if you want to see the links or see the slides you can go to bit.ly and you will be connected to my github page which contains links and differences to all this I mean all the points in this talk so you can open an issue there itself if you have any doubts but of course I will be available on Zulip chat as well so you can reach out to me there sure yeah thank you then I will bring you down from the stage and we have our next speaker already ready but again thank you and we will see you soon see you next year take care bye