 Hi, so I'm Vishal. I work with the data science team at Freshworks, and I work as a part of a specific product called FreshSales. So FreshSales is basically a sales CRM, and it helps you prioritize leads, prioritize leads, pursue leads, interact with leads, and communicate with them at better, and even so you can organize and sell to these leads. Now, if you've been to use the sales product, you realize that one of the problems with such products is that you might often have too many leads. These leads might be coming from different sources, and you may not have enough agents to handle these leads. So one feature, one AI feature that we decided to build was called lead scoring, which sort of helps you prioritize leads that are most likely to close. Now, the lead scoring algorithm is kind of trained on all the deals that you've closed in the past and the deals that you've lost. So we wanted to score these leads. Now, while training this model, one insight that we got was that the number of emails that you receive from the customer and the interactions that you have with them are one of the strongest features. And in a lot of cases, we noticed that in deals where you were receiving more emails from the customer, it was more likely that the deal would close. But one problem with this was we weren't looking at the content of the conversation. The content of the conversation was providing a lot more insight to us that could predict the probability of the deal closing. So we decided to go at it on a conventional sentiment analysis perspective. But the problem with this, again, is that most emails that are found in sales conversations are quite polite. They are rarely abusive and you really can't use conventional sentiment analysis algorithms where people are either too happy, too sad. And so most sentiment analysis algorithms that work on tweets, don't work on sales emails. So we decided to take it from a different perspective and there are a couple of challenges that we had while doing this. So first, I'll talk about data cleaning itself. Now, if you've ever done Twitter tweet sentiment analysis, which is a very cliche project to do when you're in your undergrad, you would have noticed that a lot of the data a lot of the data that you get is clean text. Whereas emails are often found in HTML. So the first problem that we had to come across was to clean the HTML text to sort of get rid of all the junk tags, the tags that aren't required. Now, even once we had the clean content, we had to get rid of the signatures. We had to use, for that, we used entity recognition. But the bigger problem that we had was a target label. We had about 20 emails, 20 emails in a single lead. And we didn't know which label to consider. The label that was quite obvious to start with was, hey, did the lead eventually close or not? Was it won or was it lost? But the problem was the label kind of applies to all emails in the lead. So right from the first email where somebody says, hi, I'm Tom, to the last email where the person says, oh, I would like to go ahead with the most basic plan you have, all emails kind of have the same label, which is not true because not all emails are equally indicative of winning or closing. And a majority of the emails that we actually find in sales conversations are neutral, where people are pretty much just acknowledging what the agent said. So this is where we decided that we initially didn't get pretty high AOC. So our next alternative was to perhaps use only the emails that show up towards the end of the deal closing because these are more, these are sort of more substantial, these are more final of sorts, these are more indicative of what the person is going to go with in the end. This did lead to a slightly higher AOC, but again, it wasn't significant enough, so we had to manually tag these, and what we realized was that a majority of these emails are actually neutral. The last thing, the last problem that we had was to embed the emails again. So once we had sort of passed the emails, cleaned them up, and we had the target labels ready, we had to figure out the best way to encode them. So we initially went with fast techs and work to work models, but to combine the word embeddings to an email embedding of sorts was one challenge. So what we tried was, and we also wanted to take some learnings from TFIDF where we actually prioritize, where we prioritize rarer terms while also sort of, while also not giving much importance to the terms that occur frequently. So we use a form of polarized word to work where we kind of multiply the polarity of different words into their word vectors and then use a classification algorithm on them. Yeah, and once we even hired an email level, and an email level embedding we had though. So I was waiting for it. Yeah, this is the last thing that I wanted to talk about was having context because emails are often part of a dialogue. So if we're looking at a customer email, we also have to look at the agent mail preceding that. So instead of scoring individual agent mails, we're scoring agent and customer mails together. Okay, yeah, finally. Thank you so much. Questions for Vishal? Questions? Okay, Vishal, so why did you try to take this as a problem in which you were doing, like there is a sequence in the emails, right? So you didn't think of question, answer kind of sequence and try and address it as that model as opposed to just a single classification model if I picked that correctly? No, but it's not just a question, answer model, it's also question, answer, answer, question model because the customer also asks a lot of questions which the agent needs to address because the customer might not be sure about how a certain feature works or how to get something done. So it's not a one, two, one, two sort of situation where we can discuss them to be. They are a sequence though. So once we do, so we sort of generate conversation for all the, we take all the agent mails in a row, all the customer mails in a row, generate predictions on all of them. We call that a dialogue. And so you label each element of the dialogue or you label the whole, whole dialogue, stream of dialogue. What did I label? We label each dialogue and then stream of dialogues if you get a prediction from them. Okay, okay, interesting. That's a lot of labeling to try and do that. We have one question at the back, I understand? Yeah, go ahead, Ma'am. So what were some of the interesting patterns that you observed during the training? Like for example, you said that you started using only the end emails, but how did that really help? Because you would like to know whether a deal is going to close or not when you start like in the first few emails itself. So in the first few emails, the customer is curious. The customer is curious to try a product. The customer hasn't made up his mind yet. However, towards the end of the deal stage, we kind of, the customer has a better understanding of what the customer has better understanding of what he wants to buy or if he's really going to go ahead with that. So that really helps us because in towards the end, the customer might make definitive statements like, oh, I really, I was looking for this feature, but this isn't present or I was, or sure, I will be going ahead with this. So because the emails towards the end are more definitive, we thought, we noticed that take training just on those gave higher AOC. Okay, yeah, huge round of applause to Vishal for his, I think there is some insight in trying to think about the buying process or awareness inside decision action and trying to see how that you can get from that email conversation, but it's a great attempt and continue the conversation outside and you'll get more ideas.