 I would also invite Bharat on stage and Bharat and Paul would have a joint Q&A. So if you have any questions for the censor TV presentation as well as this Paul's presentation, you can ask any of the speakers. Questions? Question to Paul. You have mentioned that we can find the probability of the category of the sender. So when the sender is unknown, so how this probability is calculated? How the probability to find the category of the sender when the sender is unknown is actually calculated. So are you asking about like for instance the accuracy of models? Okay, so that's what the model is doing, right? And you can use a lot of different models. A multinomial naive base basically ends up working well. So we're taking this SMS, we're breaking it into pieces, we're applying a classification model. We don't know who the sender is, so we're classifying it as a bank. Does that make sense? Question? Hi, this question is for Paul. You said that in case of when customers transferring money from one account to another, most solutions are not able to do that. How would you approach that problem? What's that? How would you approach that problem when customers? Yeah. So I think then what you're doing is getting down to the level of the actual individual. And so there, which I didn't talk about here, but we're actually classifying what kind of account it is. Is it a savings account? Is it a checking it account? And what is the account number? And because different messages will anonymize it, nobody sends your full account number, right? So you might have last four, first two, last two. There's different ways. So you're understanding what those structures are and then you're like creating a library, say for each user of their account numbers. And then you're assigning a probability of whether this particular transaction is associated with this account and this account. And then you might be integrating information like how the balance moves between two messages. So now what you're doing is modeling things at the level of the actual user. You can classify a message across all users. Now you're getting into the level of where you're actually saying for a given user, you'll call up their accounts, call up their previous balance and subsequent balance to see if the amounts add up. And that's how you get to that problem. What I think none of these problems in and of themselves is that is that hard, but layering them into each other so that you can do all of them in a pipeline. I think is where it gets complex. Let's say one day one bank changed their format entirely. I mean, if you talk about date and balance and all these things. So this solution applicable for that as well. Yeah, it's hard to hear you. So let's say one day bank changed their forward SMS for entirely. So this solution will be applicable for that as well. You're saying if the bank changes the kind of message it sends. Yeah. Yeah, so that's exactly what you're trying to handle is like either a new message from a new type of bank and you want to know very quickly that it's a bank and it's a debit, etc. Or the same bank changes its template structure and you don't want to have to read manually classify everything. That's exactly the problem you're trying to solve for. This question is of Paul. So these days we see a lot of messages. For example, you do one transaction and you mentioned you get an OTP and then, for example, you do some reason it cuts declined. How do you tackle those cases to see, okay, the transaction went through or there's something declined or something like that. You're saying the OTP is the client? No, you get an OTP and you entered it incorrectly and the finally transaction got declined and you did it again. So there will be multiple messages with same amount and all of that. Yes. How do you manage that? Yes. Okay. That's another example of like where the complexity comes in. It's not just the problem of the OTP being declined or like the OTP being sent twice. A lot of types of messages are sent twice. A lot of times you get ding ding. It's the same exact transaction. The time stamp on the SMS will be different, but the same, it's the same message. You don't want to classify that twice as two debit messages. So again, you're modeling at the level, you're sort of tracking at the level of the user where you're sort of saying, if you're classifying a debit message, you have rules like time spaces where it's a probability of it being an independent message given how early, like when the previous message was. A lot of times it's not the, if it was the exact same message, it would be to classify. Okay. But if they vary, if they send you two different types of message, one contains balance, one doesn't, right? You still need to, it's not the same message, but it's still basically the same transaction. It's how you do that is like, is again layering more of these models onto each other so that you can correctly identify that that's happening. Yeah. My question is around the quality of data you collect from SMSes. The quality of data you collect from the SMS information. Yeah. So suppose you're trying to build a balance sheet for a customer. You know, let's say he gets salary every month, but there are issues with network sometimes messes won't get delivered or the person can have multiple phones he might be using. In those instances, how do you deal with the accuracy of the data you collect and provide a confidence that the data you have collected is good to use. These are, these are great questions that you really get to like the trickiness of the system and why it's so important to have each of the systems to be very modular so you can go in and update it. So in the early days when you're first doing it, you remember we, we classified a sender as that six character ID. And then in the back of the system, you've got the phone number of the user. Yeah. And the phone number has a message is associated with it and you might just be tracking phone number to messages received. But then you want to go in and actually change the user identifier from the phone number to maybe some other unique identifier that has multiple phone numbers associated with it so that you can then realize that it's the same person. More interestingly, you might actually want to be able to, you may not know that the person has multiple phones, but you want to actually be able to sort of identify that from the messages and sort of saying, wait a second, we're seeing the same pattern. We're seeing like half the messages being receiving one for the other, but it's the same account. And if you merge those financial data, you're actually seeing like continuity there, or you're seeing family members sharing an account and both of them transacting or something like that. So that's again a problem of like, can you go into the system and like not mess up all the things around it, but just add that additional level of complexity so that your final output is much more granular and defined. Questions. Let's say you have to train all the sms's in in a distributed way. So you don't have a way to pull sms's from users and load it into a server and do this complex analysis. Is it possible to run the training as well as the execution in a distributed way in users respective phones. Run it in the phone. Yeah. Run that training of your model. Yeah. And the insights of the model. Although the training is happening in a distributed way. The learning is collective. Is there any way like that? I think that's a, that's an interesting question. I'm not sure I would like one to venture a response because that's not the way I we approached it. But I think it would depend on the degree to which you could store models in the phone. So you definitely need to like at some point get outside of the system. So you have enough data, train, et cetera. But maybe you could deploy the model on the phone. I am not familiar enough with Android development, for instance, to be able to like know exactly how that would work. But I'm, I've heard of other use cases where you're trying to basically do entirely localized prediction. Where you, but I don't know the exact mechanics of how you store the model on the phone. You usually it's in sort of like an API call. Well, counting the credit rating of the person. Do you factor in only the transactional message or service provider message also? So for example, if I book a ticket from clear trip and but I don't make payment using my card. Someone my friend's card. So I will not get the transaction message. I'll only get the service provider message. So do you factor in that or you only look at just and see that only when a service provider as well as a transaction message comes. Yeah. So the goal of the system is to, is to like try to incorporate that information. It's an iterative development process. You're solving the basic problem you're making as many assumptions as possible to make it easier. And then you're doing more and more. We were getting towards that piece. I think it's not like fully integrated, but a lot of times this matters for other reasons than like it might not just be that you don't. Like using someone else's payment method, but also it might just be that like for some reason it's historical. So once you get on, once your SDK is there, you can make it such that you receive a message. So you don't have to worry so much about missing data of that kind. But for some reason the phone might have been off whatever you don't receive a message. You have a lot of missing transactions. And let's say you didn't receive, ever received the bank transaction, but you made a purchase on Amazon. You can match that purchase against that transaction and sort of say, look, we've got a gap. If we, if we look at balances across these messages, we see the balance going like this, or let's say more realistically, it goes like this. Right. But we've got, we saw transactions, transactions, transactions, transactions, but we have a gap and the next transaction had a balance lower. We're further down the slope. We don't know where that happened, but now we see an Amazon transaction happen and we're able to place that into the, into the process. Right. But now what you're doing is once you've got into the structure data, then you're doing the financial modeling where you can kind of look at someone's account, etc. Before you can even do that, you need to know like the transactions on Amazon and the transactions on the bank. And then you can interpolate and like bring them together. But that's the next step in the process. This is purely the extracting the information process. Then once you've extracted it, then you can impute missing data. You can match patterns. You can do all those kind of things. Because you don't know when it'll be rather you want to like say take some segment of transactions before and after that one and see if you can crossfit it. Like basically identify it as the same transaction amount, that kind of thing. Yes. So, yeah. So, this is another example of like where you get a lot of false positives. You're classifying bank messages because Paytm messages end up looking a lot like bank messages. That's why it's important at the personal level is to be able to identify the account number information. So here's your account number information. You might have different formats of account, but you're modeling that. And that's what I was saying to the previous question. You have a library of account numbers for a particular user associated with which bank. So it'll be like your HDFC banks and your Citibank accounts and then your Paytm's. That's what I was mentioning also. I'm not getting it right where it's so that's really money in your system when you send it into Paytm or you send it back into your bank. So you need to be able to sort of say when you see a debit transaction, does that debit? Where is it going into? Is it going into an entity or is it going into one of your accounts? So that's again like the design of the model where you're saying this is the problem we're solving. We insert this level of models into the overall system so it runs in parallel or in sequence with all the other models. This is for Mohan. I want to know like as new content is always coming, how do you train your model? We show the ad for a different show and this will always keep on changing like next. They will show some clip from the next week's show and the next week again they will show some clip from the next the following week. So how do you keep on training your updating your model? The content is always changing. So let me just understand your question. So you're saying the clips will keep getting updated and are you asking how we can how we monetize that or how we keep updating the model? Yeah. So we don't have a model. So it's an unsupervised system. If that answers then. So the way we extract stuff is which is exactly why this is the case of a system that requires an unsupervised approach because the data is changing very dynamically. You'll always have a new kind of thing and training for something is going to take up more time. But unsupervised methods work out better because there are statistical patterns that are there that we kind of make use of to bubble up. So yeah. But the models are used to make the system more efficient. So to auto classify a clip as an ad versus house promotion, you can use models that are trained from already observed data. If once that you can a clip that is already tagged once you can use that to put it into database and learn some features. So something related to this ad comes up later on. You're able to classify it faster. But largely it is an unsupervised system. I think he's waiting for a long time there. Hello question for her. So do you plan to, you know, open this data for public anytime in future? Open the setup as in the data data part as in, you know, just like, you know, for Google Maps, you know, Google date. Yeah, so at breaks.in is is an open system. We give you high level analytics to kind of observe whatever patterns that are there. Okay. Some parts of it are copyright protected. For example, the clips that we observe are actually extracted from a broadcasted feed. Right. So they're mostly for private consumption. That's that high level statistics. Absolutely. We want to make it available for public. But you're saying you want a full fire hose of data. Something more for developers, you know, that, you know, maybe because you've got a nice set of data, probably I don't know if anyone else has it. And if you could, you know, open it for developers. You can actually consider that. So, or even if you want to do an internship or something like that, join us on board. There's a lot of data we can thank you. Hi. Just wanted a little bit more about the monetization model. More importantly, are you targeting the end customer themselves? Okay. I think at least I couldn't catch like, for example, as a customer, I'm watching. I'm using this. What's what's in it for me? Like, I mean, I just logged into your site. It's probably good for the broadcaster or the service providers. But what is in it for the end customer? Because, you know, people move the moment some ad comes. I switched to another channel. I know that I got bored. But what isn't it for me? And how can you get it personalized for the user themselves? The, the people who watch. Yeah. So it will trickle down, ultimately causing the experience for the end consumer will be much better. And I'll explain how it will happen. The reason why you see an ad 200 times is because they don't know who's seen it. Right. So there's this famous thing in television advertising, which says 50% of all TV ads spent is wasted, but you don't know which percent it is. Right. And there's also a lot of psychological research in, in advertising, which says that a brand needs to be present probably five times. But, and until then, the recall is not there. But after five times, it has a diminishing effect of, of diminishing returns and has a negative brand recall because you just hate it. You're like, why am I seeing this again? Right. And the other thing you should notice is the first time you've seen an ad, you've always liked it because there's curiosity on this immense amount of creativity that goes into creating an ad. So creating a feedback system actually makes it more efficient. Right. Because advertisers will take up fewer spots because they know they've had these five impressions. They don't want to sell more. So slowly you will see that the ad breaks will start shrinking if they know who's watching it and better ads will be able to reach people who can kind of benefit from that. Now the, the truth of the industry is you need ads to survive because you're not going to bear the cost of content. Right. So it's content is expensive. So the only way this can scale is through advertising and good products also need to come in front of consumers. So better data insights helps the entire industry. And ultimately it's going to, it's like GST is going to help the consumer. Right. But, but it, it's meant for the industry like streamlines the whole process. So that is one perspective. The other is, is what we are also creating is, is an extra kind of experience for, for television where you can search better. So in fact we are building systems where you can use voice and tell like, take me to an English movie that is not in an ad break now. Right. So in general it, it helps the experience there, or you can set an alert and say, like, come back to this channel when the ad break is over. Right. So these are all interesting experiences even for consumers. We want to be careful about how we do this because it adds as a bread and butter of the, in fact, we had a, we had tested this for some time and then we are like, no, it's not the way to go and be remoteed. Questions. Hi. I have a question for Paul. So looking at a content, I see it is kind of stat, syntactically correct. So have you tried exploding anything on the syntactical parsing of NLP like part of speech or dependency parse three, like one example that you had account one, two, three is debited amount this much. Is debited is the predicate and you have subject and object. So have you tried leveraging that information somewhere or have you explored that? Yeah. Yeah. So there's definitely aspects of this that are, you could, I think leverage NLP methods. I think there's some entity extraction. I think that, that a lot of times the way I've seen for instance entity extraction done sort of like a named entity recognition type thing is, is where it's like you're looking for any particular named entity. It's not necessarily that you're, you're looking for this particular one over multiple iterations where it's variable, but not in a meaningful way. So like everyone's account number. But in sense of taking an approach where you say you're looking at part of speech and like you said you have your, your, your verb debited, et cetera. We played a little bit, but it was, it, it wasn't as generalizable to a lot of the other problems. So like fitting that type of model was theoretically possible into the system. And that's the point was like you can leverage different approaches within the same system. Like any model at any different level can be its own, and use its data in its own way. It doesn't need to like use the same data as the previous model. But for in terms of just writing speed of a data scientist doing the same thing multiple times is easier than to like context switch and sort of say now I need to understand a little bit what's happening with the NLP. I think there's, there's some good potential to do even more than what we, when we tried out. And that's why I kind of like encouraged like oh there's, this is how you kind of get a data, get a data set because then I would love to sort of see different methods applied. I also think that like certain, certain unsupervised methods could be really interesting because there's in a sense of clustering kind of a way. I think there are some, some problems there but anyway I think there's definitely more that could be done. And that's what I was trying to say is that like a lot of times people see this and they're like I've just heard a lot of people say it's commodified, right? It's a solved problem. But just like the tiny bit at the top is solved. Like there's all, and doing it efficiently, effectively across all the different use cases that were called out in questions. I think it's where it gets interesting. So, but I think this type of NLP is, is definitely a candidate for further exploration. Okay. Thank you. Questions? Okay. I don't see any more questions. Thank you Paul. Thank you Bharat. Thank you. Thank you.