 Welcome guys, my name is Piyush Makhija, I am a NLP engineer at Wahan AI. Wahan AI is a Y combinator back start up, where we are building a jobs market place to help low scale to semi-skilled people find jobs and we are essentially solving the problem of high volume recruitment for last mile workforce for e-commerce and on demand logistics companies. To give you an idea of why we are even required, take a look at some of these articles that have come out in recent times and many more. The point is the on demand and logistics companies have put forward a new jobs market place for last mile delivery workers. Let me ask you guys a question, how many of you guys have ordered online or shopped for food in the last month, in the last week? So is there anybody who has never shopped online or used any of the e-commerce companies? The point that I am trying to make is that these services have integrated so well into our daily lifestyle that they have almost become a necessity. In some terms, we can say that the delivery person that brings your products to your place on your demand is as close as we can get to a modern day Santa Claus and the demand for these modern day Santa Claus far exceeds the supply. To give you some metrics, there is currently a demand of 500,000 delivery jobs in India and this is increasing at a 30% year on year rate. Most of the e-commerce and logistics companies are already heavily investing in scaling up their logistics operations. To give you an example, Danzo, a fairly new player, increased the size of its logistics operations by 10 times is just in the second half of 2018. So why is this particular problem hard to solve or what is different about this problem of hiring last mile workforce? The first point is that the per month higher requirement is extremely high. It can range between 500 to 10,000 depending upon the scale of your logistics operation and if you feel this number 10,000 is high. Swiggy Flipkart during the seasonal sales have a requirement of up to 50,000 delivery jobs in a month. The attrition rates in this particular industry is extremely high which is somewhere around 75% in 2 to 3 months. The hiring demand is seasonal and sporadic and the nature of the hiring process is highly manual and unorganized. So the job space is highly fragmented, inefficient and really hard to scale. So the problem to put it in simple words that we are trying to solve at Wahon is to build an efficient and scalable system for high volume recruitment of this last mile workforce. And once we had the problem in mind, these are the design considerations that we took into account while building our solution. The first is that we need to reduce manual labor by a lot. So we have to bring in automation in some sort to the processes that are involved. Second is that we need to build a one stop solution that is there needs to be a one solution that accounts for different scale and size of logistics operation and works for different verticals. We should be able to work at scale if we have a requirement to hire 500 delivery persons in a month and immediately next month we have a requirement for 10,000, we should be able to scale to that. We need to account for high churn rate and attrition which is a reality in this particular sector. And next there is no LinkedIn for blue collar workers. By this I mean there is no social or professional platform on which these people can connect with the recruiters or their peers or they can develop a career profile. And finally when we are developing solutions we think let's build an app for that. But in this particular segment an app brings a harder challenge of adoption. Will the user use my app or not? Will the user after using the app not discard my app? Because a particular recruitment app is not a daily use application. So it has no use being on your mobile applications once its purpose is fulfilled. Following these design considerations in mind, we at Vahan came forward with the following solution. We needed to automate job seeker engagement on existing messaging platforms. The idea is very simple that on the existing messaging platform such as WhatsApp or Facebook Messenger you can reach out to us on our account and get the conversation started. We recommend you jobs and help you guide to a job process. By the way this is a dummy number. This is not the actual number we have in production. This is just to illustrate my point. So how did we at Vahan address the design considerations? Let's go over those points. The first and most major challenge that we solved was to circumvent the adoption. So India runs on WhatsApp. 97% of the smartphone users in India are already using Facebook Messenger or WhatsApp in some form or another. They are very comfortable with these platforms and most of these platforms are being used on a daily basis. So our users are already existing on these platforms. So we adapt to the users. Our users don't adapt to us. Next we need to automate certain processes and we found that these are the processes that we can automate. That is we can pitch to the user a job, gauge the user's interest in that particular job, do some basic screening for the job requirements and get them to a walk-in. Next we thought that our solution needs to be a virtual assistant in order for us to be automated and a virtual assistant can be of different types. It can be an IVR based solution, a voice based solution or a text based solution. An IVR based solution is a very rigid and inflexible way to solve this problem. Our users won't be able to interact properly with our system. They won't be able to provide different kinds of inputs. They are restricted in the kind of inputs that they can provide and they cannot query to input. So we move further to the next. You can standardize in any language. The point is you can only collect limited amount of input. If the user provides any feedback, you won't be able to respond to it. You won't be able to adapt to the user at that point and hence it's not an efficient mode of conversation. Next we move on to a voice based platform. In a voice based platform the issues are that we have to provide real time responses and given the data bandwidth and quality issues that we have currently coupled with background noise and error amplification in speech to text and text to speech systems, we found that this might not be the way to go right now. And hence we decided that we will go forward with a text based virtual assistant or a chatbot. We decided for this because our users were already familiar with these texting platforms and they were using them on a daily basis. So it further helped with the adoption. Furthermore we needed to avoid hopping across different channels through various segments of the recruitment process to give you an idea of the current recruitment process in this domain. The first things that a recruiter does is that they generate leads via classifieds through knocky.com or quicker. Next they process these leads on excel sheets or in certain cases by manually on hand by paper. Next they engage these users via telecalling or in some cases on ground operations in person to person way. And they screen them on site, they call the user to their hiring centers, they screen them there and get them on boarded. So the complete hiring process is very manual and very segmented. There are different separate processes that are interacting. So we need to reduce the attrition over here and bring these processes together in some way. The next challenge is scalability. Most of the companies that are doing this task today have a very high scale logistics operation which are interacting in a very manual effort like it's telecalling or person to person interaction. But with a virtual assistant we can reach out to thousands of people at any instant. Next we needed to counter churn and we were able to do that by increasing the top of the funnel that is due to a high adoption rate by bringing in automation and scalability we can provide more leads. We can generate more leads or get more people into the system and counter for the churn that happens. And finally we are able to build rich user profiles which is ultimately going to lay the foundations of a professional platform for these blue collar workers. So essentially what I'm talking about the solution that we have built is a goal oriented dialogue system that is we are going to carry out conversations with the user to complete certain tasks and finally reach a goal. The goal being that we need to qualify a user for a job that he or she may be interested in and the task that we are going to perform over here are pitching the job to the user gauging their interest screening them on our platform itself and scheduling their walk-in. So to give you a bird's eye view of our solution of how a sample interaction happens on our platform take a look at this example. Consider the case that we recommend a job to a user something like we have a job we can see and so and so company and so and so location are you interested. Option one yes option to no the user may choose to directly respond to us that yes I'm interested no I'm not interested or they may ask for certain things by themselves they may have their query of their own. In this scenario the user is asking hi sir salary kithna melega which is hi how much is the salary that is associated with this job. And as you can see certain words in this utterance are misspelled because that is the way we type we do not care about spellings or grammar when we type on these texting applications. So the first thing that we do over here after preprocessing is apply a basic text normalization which is something something of a fancy spell correction we'll deal this deal with the detail in a few slides later but for now we are normalizing the data and standardized bring it to standardized words so correcting the spellings of some sort. Once we have corrected those spellings we have reduced the variability in data we first check that this input utterance match any of the option choices that we provided. The user might have said I am interested which is an indirect way of saying yes we could have matched that over here but in this scenario the user is saying the user is asking how much is the salary so we further go to a intent classification system. In the intent classification system we identified the user's intent as a request for salary information and once we have identified with certain confidence about user's intent we can do a response lookup and provide appropriate response to the user. This is a sample interaction with the user that happens on our platform and in this particular sample interaction we are using two machine learning models one for text normalization and one for intent classification but for any machine learning model as it's always the case your model is only as good as your data. The problem with our systems is that for any machine learning models we need to collect the data first and the kind of case that we are solving does not have any public domain data that we can use and this problem is further aggravated by this following things. The first thing is that our user uses mixed languages. Our user does not use any kind of proper languages and most of the users in India are polyglots that is they mix multiple languages together when they are typing or speaking. I myself being a native Hindi and English speaker converse while mixing Hindi and English words together in a statement and that is the way I type and while typing I further produce more interactions in the way that I will have typos I will have slangs I will have abbreviations or misspellings or mispronunciations that creep into my typing itself. Furthermore the vocabulary that I use while mixing my Hindi and English might be different from the vocabulary that you have while mixing your own Hindi and English. So my English is going to be different from your English which brings to me next point this high variability in the data that is to write a simple word such as yes for you and I to understand over a texting platform there are at least 50 ways in which I can say yes and you will understand I can say yeah yep yes all meaning a positive yes. I can also type these in roman script but in a different language like ha for Hindi how do for Kannada or in certain case for native karnataka speakers who have the some way of spelling out yes as s with a heavy intonation to the letter s they also type out s in this way as a response yes. So this all brings a lot of variation to the data and the problem that we are dealing with is again that we don't have these kinds of data sets available anywhere. There is poor availability of research level data on these data sets the proper term over here with the type of data that we are dealing is a code mixed data set and furthermore it is this problem is aggravated with regional and cultural influences to give one more example of regional and cultural influences. If I say to somebody in Bangalore call Madi which is a English word coupled with a Kannada word meaning call me somebody in Bangalore is most likely going to understand it because this kind of phrase is associated with this region but somebody in say Delhi won't be able to understand this. So all of these problems coupled together made us very hard for us to solve any of the text normalization or intent classification issues. So we had to invest heavily in data collection exercises the first that is we had to build bots to collect data for our bot. The first exercise that we entailed in is creating a jokes and daily codes bot. We understood that on these texting applications users frequently give out like send each other jokes or daily codes. So we decided to go that route but this approach did not help us because we received a lot of spammy messages. We were not finding any data that we could use to build a comprehensive data site. So we moved further and built a translation bot that is users will come to us and we'll provide input English translations to whatever input they provided. This seemed like a good approach but the problem was it had very sporadic usage. It was not something that the user would come to come back to very often. It was an upper need basis that the user would come to us. So this did not seem like a reliable way to collect data. We moved further and built a English learning bot that is we put together a script where we'll help the user interact with us and learn some basic English. The problem was with our script that the content that we were receiving was very narrow. We are receiving only very limited kind of interactions which did not help us build a comprehensive data set. Finally we struck gold with a friend finder bot. This is an idea that came into inspiration via chat roulette. The idea is simple that we are going to connect two people on an anonymous chat, let them talk about whatever they wanted to and with their permission collect their data. This bot got so popular that we were collecting two million messages per month. So finally we solved a cold start problem for the data set and have a reliable way to collect a comprehensive data for code mix languages pertaining to the population that we are going to build the product for. Next thing is that how do we handle all these nuances of code mix data? So the first problem that we have is the text normalization. Here is a very simple example of how our text normalization system works. Say you have two sample utterances from the user. Hi I have a bike with high misspelled as hii and mere pas bike hai which is a Hindi statement saying I have bike in which the last word hai is misspelled as hii. Both of these utterances have the same letter sequence but both of these letter sequence are misspelled in the wrong way. So a text normalization system needs to take into account the context of the statement or the context that is occurring with the question maybe and produce a correct normalization. In the first case the user is providing a greeting hai and in the second case the user is providing a Hindi word hai which means I have. So taking into account the context we need to provide a correct spelling check. So this is some sort of a fancy spelling character and the way we did it was via machine translation models. So we started with a statistical machine translation because it had lower data requirements and a statistical machine translation provided good performance with smaller data set. But as in long whenever data set grow statistical machine translation stopped providing incremental gains in performance and we moved towards neural machine translation model. A statistical machine translation works on a probabilistic way that it sees a certain interactions happening n number of times and maps it as a Markov chain. But if in case it has not seen a particular translation happen or a particular instance not happening it won't be able to account for that like in this case h followed by triple i which is misspelling of hai a statistical machine translation model was not able to correct form. But a neural machine translation model given enough input utterances can account for similar kind of misspellings which it is able to do here with the same training data set. This particular model we were able to train with over 5 million data points and it is still providing us incremental improvements in performance. To dive a bit deeper into the neural machine translation that we implemented it's a pretty standard character level model that we implemented that is available in the text. So we have input embeddings for all the roman characters and we pass them through a bi-directional LSTM layer which further go through a attention layer, a decoder which is a simple LSTM and finally a softmax layer. So the kind of performance we were able to get with these machine translation models provided significant improvement over baseline. Our baseline was a 52 percent word error rate that is 52 percent of the input words that we have did not match anything in a standard English dictionary. 52 percent of the words were out of vocabulary for us but after applying machine translation methods less than 3 percent of the input words were out of vocabulary which was a significant gain and helped us improve our intent classification furthermore. The average sentence level blue improved up to 0.97. This is a metric that is lying between 0 to 1 that is we computed blue score over every utterance and took average upon all the testing set which was held out of sample. So now that we have solved so now that we have solved the text normalization issue we've moved further to the next problem the intent classification how do we understand the intent of the user. So in this particular segment a user can ask different kind of questions something like Apcon who are you that and why you testing texting to me so we reply to them accordingly. The user can provide something input that is more colloquial in nature so if we ask to the user do you have a driving license as a screening question for a particular job the user responds with I have LL only. In this context LL means learner license our normalization system is able to take that into account and we understand that the user is saying I have learner license that means I don't have a driving license that means the user is saying no and we respond to the user accordingly. Similarly the user may ask different kinds of queries or different kinds of indications regarding the job or their interest in a particular type of job something like what is do I get part-time job do you what kind of salary or incentives do you provide and so on. But as I said this intent classification system is made particularly to solve this domain specific problem so it cannot solve out of domains problems like a bus reservation. If you ask it to book a bus for you it won't respond properly and keeping in mind the data that we trained with if there is a particular a trends that you did not train it to handle for example this thing happened so it won't be able to handle this thing has been since corrected with but then there are certain cases where your model over fits and it tries to respond to questions that it is not supposed to handle. Taking into for instance this kind of a trends I don't know where to find a Pikachu this is a fictional question and our bot having overfitted over certain words like where in this case this understands this that this might be a location query and provides the response according to that in this case it should not have but it still does. So a little bit more about the performance that we are getting and the systems that we used for the intent classification model. We have experimented with Rasa NLU and Fastix libraries. We chose these libraries because with these libraries we did not have to take on the burden of building a new embedding layer for our specific data set. We can't use a generic glove embedding or our two way model. So we had to build our own embeddings and this particular case or these particular libraries did this autonomously with the data that we provided. Provided we had to give it sufficient data and Rasa NLU worked well with limited number of instances but when we increased our data set size over time Fastix provided phenomenal performance gains. The fn score we are able to get was somewhere around 0.94 and we were able to get up to 0.92 coverage. Coverage here means that of all the input utterances that we are receiving from the user, which are not direct responses to our option choices. 92% of them we were able to respond to. The current ongoing efforts that we have at one to improve our machine learning models is that we are exploring with different methods or text normalization approaches, particularly in machine translation to improve our models like BERT. We are trying to improve our performance on the domain specific data itself so that we handle the user's responses very well. Apart from trying to improve the performance and intent classification systems by adding more data, we are looking into different architectures by which we can improve our performance. We are also looking into expanding our domain to different languages. Currently we are able to handle mixture of English, Hindi and Kannada very well. We are expanding to mixture of English, Hindi, Kannada, Tamil, Telugu and furthermore than Bengali and Marathi also. So this is some of the ongoing efforts that we have. We are also looking into creating a context-based dialogue system where we can understand the user's context and provide a much more humanly responses instead of just basic index lookup. A few key takeaways from the problems that I have illustrated is that there is a urgent need for a high volume recruitment solution in this particular domain. And this need is now. There is an urgent need for a social and professional platform to be provided to these users so that they can build their career profiles and exceed in this career. There is a the kind of solution that we build needs to adopt to the user rather than the user adopting to your application or whatever solution that you bring up. And this solution needs to have some kind of daily use for it to be relevant to the user. There is a immediate need for research and development on code mixed data sets or the way people interact with each other colloquially. And we can start by study forming strategies to collect large amounts of such data sets. And we need to also account for personalization of regional persona. We need to adapt ourselves to the particular region or the particular way a person interacts. The long term vision that we have at Wahan is that we not only solve the jobs problem for these kind of people, but we also help them hone their skills, become financially stable, develop a social and professional community we can where they can look towards the holistic development. In the short term, we are we think that in the near future, once your modern day Santa Claus comes to you bearing gifts, he or she would most likely be recruited via platform. Thank you. Questions? Correct. So, there was heavy focus on like labeling that data. But we also apply different approaches to make more use of that particular data. So, we use certain data augmentation approaches to increase our data set size that is synthetically generating data. So, half of that would be like manually annotated half of that would be labeled or synthetically generated. Correct. So, I mentioned over here that we are still working on building this for domain specific data. So, the data that we collected using friend finder bot is kind of like a generic start that we got and then we collected further more data from our platform that is the job finder application on WhatsApp. So, from that we connected very domain specific data or use cases where the user tells us about misspellings of self-salary, misspellings of timing, misspelling of say company names and so on. So, we also train on that data basically by guiding the user. So, considering direct and indirect responses of the option choice around 10 to 15% is text that is not matching the response choices. Anybody else? Yeah, we have obtained the WhatsApp API and we have like all the business contacts. But before building the tool, we had a cold start problem. We had no way to like collect data and even if we launched our tool without any NLP like we would be collecting not enough data on a daily basis with a small start. So, we solved the cold start of data collection problem over there. So, since then that particular bot has been discontinued because we are collecting a lot of data on this platform already.