 I have come from Delhi, India and I had been associated with the open source in some of the other form. In past I have done some contributions like the Drupal module and I have few react native two react native packages in the JIT Hub which are available. So today I just wanted to why I have came here is that before before we get into the actual technology part you know I would like you guys to actually understand the problem that we are trying to address. So let us do a small activity before we start ok raise both of your hands ok and now close down your ears as hard as you can and now try to understand what I am speaking even a one minute it was very difficult for you to interpret what I was speaking now imagine the life of a person who has been who has who is either born or has developed a hearing disability how difficult it will be for them to have a normal conversation with the society and the problem that with whatever methodologies we have is that all the methods expect the person with disability to adapt according to the problem to the society instead of society trying to find a way to be able to interact with them yes there has been in past has been an attempt and one of the most successful has been the hand sign language. Now although hand sign language is is more or less a universal mechanism to communicate but it has its own issues now one of the major issues is that it is not easy to learn you know it is not possible for people like you and me to be able to learn it so easily and while and most of us are also not interested why why should we go and get into it we want to learn foreign languages but that is ok and it is very difficult to find the interpreters also so I was reading on the internet it is not a verified number but I read it somewhere on the internet that in year 2017 there were only 250 certified hand sign interpreters in India imagine India a country of 1.2 billion people had only 250 certified interpreters so this is if you look at it it does not seem to it but it is actually a very deep problem I mean if a highly populated country has the situation imagine the countries which are not so developed and have lesser populations it will it is almost next to impossible. So how do we fix this so how it all happened is that one fine day I was actually thinking what can I do in ML and all and so my wife happens to be a special educator she works with children with special needs to educate them and all so during one of the conversations a topic came up with people with hearing disabilities and it was like it is very difficult to understand so how do we do and when I looked at it carefully I realized that it is actually a good problem to solve using machine learning because although we have interpreters today have real time interpreters which can translate from Chinese to English from Hindi to English to English to any other language but why not for this language we have character recognizers we have speech recognizers we have gesture recognizers everything is there but there is no concrete solution which I could find so I started looking around initial it was initial days of earlier 2018 I started to look around mostly working at on it over the weekends whenever I could find time but could not find any solution which was closer to it initially then I thought let us start using a neural networks to solve the problem so what I started out as how people generally would start doing and trying to solve it I tried to take it as an image classification problem now for those who are not aware of the image classification problem is basically it helps you take a bunch of images and then either you know classify it in a certain category that you know this is an animal this is a human this is so you can do that kind of an image classification problem thinking that at least let us start with the alphabets I will be able to classify the alphabets and then move on to join those classifications to come up with words and sentences and all that we do and maybe add some grammar to it at a later stage but yes it worked initially for certain alphabets but for certain alphabets it was not working out specially like alphabets like C so C is represented like this in in hand sign language this is C now the problem with C is and I will I will show you guys on if time is there on the on the demo also it never no a finger could be a little bit straighter you know for example if when I do this the default poses this my pinky finger is not curled it is straightened and there are other things that somebody might do C like this somebody might do like this and and and at times because of camera angles it might not appear curled also in in camera so those were the practical problems that I started to face and you know initially it did not work so I thought probably it is not a good problem and I stopped working on it and then about a month later I was going through something and so and while I was doing this I also came across that there are other issues in hand sign language that because based on certain regions more words can be added you know it is more like Japanese in Japanese you can add more words to the language you can create your own words if you want so so based on certain regions you might have specific words and at times there are words which are represented by a specific gesture not just alphabets so looking at it so imagine training just if even if I take just 26 alphabets for those who have actually done machine learning you do not get results with 100 or 200 images if you want to have a prediction and when you are actually trying to talk you want to have a prediction of over 90% you know prediction an interpretation which is very close to the human intelligence and that actually requires lots of images you know a lot of couple of images that which are required so one of the attempts which was done by one of the students who happens to be my cousin also was that he took he took blender and he started animating now he defined blender animations and he said let me animate and generate more and more images out of it and then we try to train it but that's although we were able to get close to the numbers but it was still the problem was how will we add more alphabets to it how will we add more words to it that was increasingly a problem because training a neural network is not an easy task it takes time it takes computational power to train a neural network and the size of neural networks is huge now imagine I train a neural network based out of India on interpreting and somebody in Singapore wants to use it what will happen I will have if there are I will have to train it for the according to the local dialect yes I will have to and then the whole data has to be transmitted also back to that specific phone or the app has to be whatever is there has to be updated and my actual target is that I want to come up with an Android app that can do this so I kept trying so one day as I said I when I was not getting good results I stopped working on it for a while and then one day I came across a blog post on medium which was trying to estimate the hand poses and going through that hand that blog post I came I came across some other research work done by couple of people not couple of people but couple of universities where they actually used a combination of a neural network and a Cartesian geometry to be able to understand what exactly the pose is right now so if you can't understand it's like how many of you are here aware of pose net we are cross came across pose net so pose net is this neural network so I am sure some of you would have seen these virtual cloth trial equipments where there is a mirror on front of you stand even you stand and you can just try whatever clothes you want and even though you move around the everything moves around with you so that's that comes from a neural network which has been trained by a couple of universities and research community and is known as pose net so how do you work in neural network is basically you train these networks and why you suddenly see this burst of AI and neural network is because there had been good framework and good data sets which have been released so so there are some of the famous networks were available networks which is pose net and then there is mobile net this cocoa so and then famous is inception from Google which has a collection of huge libraries which can be used to for object detection but if you look at closely it is not just an object detection problem it is it is more or less to be able to identify the specific pose so going through it I found some more links which which are there so these were the two links this one is actually the first useful thing that I found this was this was one link which which is a research by some German University I think it's German if even if I'm wrong sorry which has they have published a paper and they actually used what they did is they used Cartesian geometry so Cartesian geometry is geometry in three-dimensionals generally the geometry that we study in school is only two dimensions x and y but Cartesian geometry adds the third dimension so now I have a I have a solution that I can identify the posture of the hand I can identify the finger is going up or going down or going vertical or it or is it curved so I am able to so with this I am a if I look at just if I get a picture like this I the the network will be able to identify that okay this is an hand these are for my five fingers and on top of it using Cartesian geometry you can figure out that you know this finger is curled this finger is pointing up this finger is pointing diagonally and that majorly solved one of the major that's why I wrote moment of Europe because that majorly solved the problem that I was trying to solve so this was actually the guest post by a guy called Prasad Pai he he had done some work based on work of these guys and trying to I'm sorry if you were expecting I can show you the code also the code is already checked in in the jet hub and all and I'll show you the sample also no it's actually very difficult to show because these neural networks are these are like big files no 600 400 MB files so these are all trained networks which have been trained so when I started looking at it I actually found that that a lot of work has been done in this and my approach was correct that whoever has been able to work on it has been able to work on it by trying to identify the pose which the hand is using instead of you know just using images to identify if you go to this link you'll find a lot of things so Eric's papers which has done a lot of work and then this guy will point you to the data files which are which are used and on those data files you'll find lot more connections I'll show if I can I'll show you a lot more data sets which have been trained which can be used put to other usages so this guy actually so and this is this is like a master list of so there are a lot of networks which have been developed by various universities in you and others a lot of research has already gone into it and there are specific formats all have been assigned so this was basically the research which has which I had which had gone in now based on it yes I was able to identify but I had to work I had to get this to work so yes there was this solution which was able to identify poses this solution which was able to identify poses but they were not labeled no so still the work was there that not only we have to identify poses and label them the second problem which came across was they were working on static images but I have to get it done in real time so then again the the next bunch of problems came in that we have to do it in real time and you know trying to interpret real time moving objects changing poses is is not that much straightforward if you try to do it I mean anybody here has worked on open CV so if you try to put open CV and try to do a network analysis while in real time just see the FPS rate how it drops badly so yes we have to look into certain other solutions like trying to come up with trading and all to be able to get this to work but finally luckily in January this year we managed to get some breakthrough so we were able to identify this is this is a shot I just took sitting here that's why I got late I just thought it's better to take a shot sitting here so this is be be so that's how we were we are right now able to identify most lot of alphabets yes it is right now a work in progress and what I what I need now is is basically I need help from you guys because the reason of speaking out here is that you guys are from different countries I want help from you guys to be able to if you can connect me to NGOs or institutions or any other project working in your country which wants to work for people with hearing disabilities I need more less okay so actually initially I got a message my talk is 20 minutes but I realize it's listed for 40 minutes I prepared for 20 minutes only so so I want help from you guys to connect me to such organizations who are trying to work with them and that way because I want a lot of local information on on what kind of words are there what how the local dialects work yes I'm also looking for volunteers to help me out the the open CV based Python code is there I'll share the link but there's an all I'm also working on an Android app Android and iOS app that's little bit stuck I wanted to show a demo but you know I'm having certain problem in converting the existing tensorflow models to the tensorflow light models which are used on Android so that is also a work in progress so I'm looking for people who are willing to contribute in terms of adding more words in terms of improving the existing software and in terms of taking building a better user interface I also working on a web interface from which which I can give it to the NGOs who want to work with it and where from where people can add more tools so there is a small tool also which I have developed which which basically takes a video so if you want to add a new gesture the that workflow has been worked out that you can take a take shot shoot a short video about about 30 second long video if you can shoot with some noise noise as in you'll have to move hand and all so based on that we can train a neural network to be mapped to a certain word on an alphabet in terms of contribution technical contribution where I need most help is that there are certain gesture there are certain words or certain alphabets which are which are not signs per se which are gestures for example Z or Z as we say it in English is like this so what we need also is not just the not just the identification of the word there are there is a complex problem still left before it this solution becomes a usable solution is to be able to add gesture control gesture based interpretation to it which which should be a not that much of a difficult problem for somebody who's passionate to solve it yes I have certain ideas so that's more or less about it what I wanted to introduce any any questions any suggestions you guys have see connect again connect is again that is gesture recognition that has to be built because see when I started off I didn't real I even I didn't know sign language and as I kept on developing I'd realize that yes there are other issues and there are certain words in terms of alphabets it's only Z which is like this other all other alphabets are static but yes there are words which are which can be added in in other languages in other dialects which which can basically track which which basically make it a little bit difficult to track on static images and we need to have those kind of gesture controls but yes using neural network you can actually trace you can track the moment but the thing is that it has to be the robustness is required because when somebody is speaking if you have ever seen somebody talk communicating in hand sign languages they do very fast because it for them it is like speaking how we are speaking so they do it very fast so that processing has to be handled in in a in a super fast way that you know or maybe we can tell them to slow down little bit that is still ok so once we are able to crack more and more alphabets then we can actually plug it into other translators and things like this a text to speech or speech to text kind of a model where you know the person is speaking and you simply point an app to it and they are able to talk back so that's primarily what I'm planning to that is why but to be able to reach that level as I said we need to be able to build the whole database first right now yes whatever words or spaces basic things I could come up with is is is ok that we can but we need somebody who's we need people who are experts at it and can add it and yes once it is done then analyzing a video is is not a difficult task at all it's like you know you just have to stream pipe the stream from a video instead of a camera and and the network will take care of it so not the not I would say I haven't done the the what you are suggesting but what I have tried upon is there are certain YouTube videos that I could find out in which the communication was happening in this language and try to interpret it and that is actually where try started to find out what are the issues in my solution one of the major issues right now is that it is slow it can be speed up I am looking into it ways to be speed up because yes there are object detectors like YOLO and all which are very fast now you know they can not only can identify the object they can identify the speed of the at which a certain object is moving so that all we are looking at it right now what I have is a basic framework which is there the basic problem which I am trying to solve yes it can be improved and that is why I am looking for volunteers to help us out contributors not volunteers any other question ok thank you