 So this is Thuang Nguyen, right? He's data scientist in trusting social, right? And this is Chris Sacks, right? But it's Chris Stucio, right? This also works since the mic does not exist. So what we have done is just speak up a bit, right? So he works in simple as data scientist, right? And most of today's talk is fundamentally about correct scores in modeling, because if you look at the expertise of these two guys, it's on modeling and data science, right? And we'll be spending a lot of time on whether how to do modeling of credit scores, how does it work, what are the implications of that, et cetera, et cetera. I would be playing mostly the devil's advocate on some of the questions I would be asking them. So you have an option of asking more questions as the topic goes. There are three distinct things that I wanna cover on. First is, I looked at the people who have RSVP'd, right? The audience, there's about 15% to 20% of them actually don't really understand what a credit score is, right? So we'd probably spend a bit more time talking about the credit scores, right? Then there's stuff about how does credit scores actually have a problem for people who don't have prior history, which is the financial inclusion part, right? And the work of trusting social has been mostly around other alternative inputs that can use to create credit scores for people who have not had prior history, right? So we're gonna talk about how what are those alternative inputs, right? The third part is, okay, which of those inputs are predictors, how does work, the social release, a better predictor, we're gonna talk about that. If we find time, we'll talk about the privacy part, but I hope we may probably not have too much of time for that, given the time of the conference talk. Fair enough, yeah. So two questions I wanna start asking. Could you be able to explain us about the whole point of credit scores, right? And I mean, I get alone, we understand what a credit score is, purely from a pull forward, is it? I think it's not all of that. Back? Even further? I think all of that. Okay, okay, more closer. Especially you guys. Especially, yeah. Okay. I think we're supposed to face that way, so maybe they should come this way. You're supposed to? I think we have to look in that direction, so maybe those folks should go there. Yeah, maybe. Abish, do you wanna see, is this the right thing? Or do you wanna see face towards the camera? It's fine? Okay. Can the other people at the side hear us, or not? Can you? Or do you wanna speak a bit more louder? Okay, so let the screaming begin. We are gonna lose our voice. Yeah, fair enough, fair enough. Would you be able to explain what a credit score really means? Because I mean, I apply for a loan and I understand that there is a number that people tell you, 800, 400, whatever, right? But what does it really mean? I mean, is it like a predictor for what? So basically, credit score means a predictor for the chance that you're gonna pay back for the loan that you have applied. So for example, the score range from 300 to 900. So if your score is above 700, then we, as a bank, we believe that there's a very high chance that you're gonna pay back the loan in full. But if your score, let's say, less than 500 or something like that, then there's a high chance that you're gonna default your loan. Okay, so when you use the word, you're gonna default and you're gonna pay back, okay? Is there a number that you have? Is there a timeframe that you have? Or is it just simply, what does it really mean? So basically, when they say default, they have a timeframe when they measure the performance of the loan. Let's say you, they measure the performance over six months or nine months. And if your due pass on 90 days, then they consider that you already default the loan. Even though after 90 days, we can have some recovery method to recover the loan. But after 90 days, we consider that the consumer already default. And the recovery part is a different thing. So you work on simple, right? Do you have the same concept of a credit score? Is it much more? So I would expand the description a little bit more. So there's, so one thing you can do, and for instance, we are writing our own book, like we will issue you money, comes out of our bank account. And the primary thing, as simple that we care about is if you're gonna pay us back. A typical credit score like FICO or Sible is actually meant to be more general purpose. So they trade off a little less predictive accuracy on some particular loan. They try to make a general purpose, whether you will be delinquent on any particular loan in the next 90 days. So the point is something, and I'll talk about FICO, since I know it's the best. FICO will measure if you have a loan, a home loan from the bank, a car loan, and maybe some consumer on security. FICO will, he is gonna try to predict whether you will be delinquent on one of those loans for up to 90 days. Whereas like that simple, our score is simply based on whether you're gonna be delinquent on your simple bill. You don't care about your home loan or your car loan. That's the bank's problem. So credit scores are typically about building a general purpose score that works pretty well across all your loans. Whereas most organizations that are actually issuing money will have their own internal score, which is solely for their specific needs. Okay, so when you talk about credit score, is both purposeful, I mean attached to a purpose, which is simple for you, or could be generic across all the spectrums? So typically a credit score will be generic. Like anything you'll get from FICO or Trusting Social, like anytime you buy a credit score from some vendor, that's meant to be a general purpose one that will work across all of their customers. Whereas internally, you'll typically build a score on top of that plus other things, and that'll be used for your own loan. So if I'm running my own startup, doing my own lending, I would probably purchase a score from you, and I had to do something on top of that, and you'll probably purchase something from Sible, and then probably you'll look at a few other things that might be more special purpose for you. So in respect of whatever I do, I still have to, I mean all the credit score becomes input to my own personal. Yes, typically you'll want to do a little bit more than just use the score without any modification. So I mean here's an example. A consumer unsecured loan is when you lend money and you hope the customer pays it back, and if they don't, you're out of luck. Whereas there are secured loans, like for instance, a home mortgage, people put a lot more effort into paying those off than consumer unsecured, because if they don't pay that off, you can eventually repossess the house and they have to find some place to live. So they'll put more effort into paying off a home mortgage than they will into some consumer unsecured. In the US, medical loans are some of the first ones that are written off, so people will typically try to pay their medical loans last. Okay, so therefore I want a much higher credit score from a medical loan than if I'm issuing, let's say, a car that I can remote control, switch off if you don't pay the bills. So how does student loan works into this mix? Student loans are guaranteed by the government, and also you can't wipe them in this in the US over here, I don't know how it works. Yeah, here nothing is guaranteed by the government. Yes, in the US they're guaranteed by the government and also you can't, if they're bankruptcy, you cannot wipe them out, okay? So when you talk about student scores in student loans in US and elsewhere, what do you do for the prior? Because if I just graduated into some university, I may probably not have any earning at all. Student loans are a big, complicated mess and it's more about government policy than it is about underwriting, okay? But does it not, how do you address the problem in other places? So I wanna put in the context of Australia because in Australia student loan is also a big issue in terms of policy. So in Australia the government runs the student loans and basically the money is coming from government funding and basically you won't need to repay the loan if your income after graduation is less than, let's say $50,000 per year. So and the thing is student loan in that case is a bit different because they are managed by the government. So the ATO is gonna manage your income anyway. So they're gonna know how much you earn per year and they enforce you to pay back that through the tax. So the reason why I ask about student loans is because there's no such concept of guaranteed student loans in India, right? And there are startups which try to sell loans to students and they get into all kinds of trouble in terms of default. So one of the stuff that I keep asking about is like if there had been no history of these guys, there's no jobs, no sets, nothing. How do you even give them a loan? I mean would alternate models work like I'm not gonna give you a loan? Well, okay, I mean you still have some predictor of what earnings are gonna be. So the question is, so the two questions are will the person earn enough to pay it back and also are they likely to do so? So given this, you can still, keep in mind like any loan you issue is across a pool. Some of them are gonna go to the link when you have to figure out the interest rate to charge a remainder so that you can make up whatever you lost. So what you could do here is for instance, a bunch of IIT grads are probably gonna make pretty good money. Pune University, probably a bit less. Similarly, there are regional differences in repayment rates. Like some cities in some states have much lower repayment rates than others. So you would probably, if you haven't been in one of those areas, you would probably charge a higher interest rate or just not do business there. And you'd probably observe patterns off across colleges as well once you actually have some historical data. Do you see these kinds of problems in Vietnam? Because the reason why I started loans was important for me is because, A, there is not much prior data, right? And there is a distinct answer community in some of the developing countries. So do you see more use of alternate data sources for them compared to other? So in Vietnam, our business is not only focusing on any particular customer group and in any particular segment. So, and we haven't looked particularly for student loans in that context. But, so then let's move to the second part of it. I mean, you have any other questions on correct scores before we move to the alternate data sources? I have a question. Please ask. So how standard is the 300 to 900 values and about this in a few years? Is the credit score sort of like linear or is it something like a conflict would do a large percentage fall in the middle? Or is it something like, you know? Yeah, yeah. So, I mean, I can tell you what, the only place I've seen data on this is FICO and even that is typically only for specific use cases. But typically, like for certain specific use cases with FICO, it will be kind of a, not quite normal, but like a bit fatter version of normal. I mean, the 300 to 900, you could easily make it zero to one. They just, they could get 300 to 900 but easier for consumers to understand. That's just marketing. And it's typically a smeared out normal. As far as the applicants go, like where that normal is centered is to a great extent based on your marketing. Like who you're marketing to. And then, like what is it? Your marketing is targeting a bunch of poor people with a lot of, like lose their job a lot and just don't feel like paying the bill sometimes. It'll be a normal closer to 300. And if you are targeting like a bunch of Richie Uppies in Greenwich, Connecticut, it'll be pretty close to 800. It's simple that, hopefully, you'll have your own credit scores in your own notes. So you have your own predictions and that's what you have. Do you find any simple scores that still has putting the views around signals? We have actually found traditional credit scores don't add very much to what we already have. In the past, we've added a little bit but it wasn't like worth the extra engineering effort to keep it and at this point we don't use it at all. How do credit scores usually come up with? It's limited to one of the data sources you have. Yes. But there is not more data sources outside your credit score. Correct. How do you work on for it? We, I mean, any data sources we don't have in our system we obviously can't use. But the thing about our data that makes it well suited for our purpose is that typically our data is based on things like transaction history on merchants where a symbol is available. So typically speaking, we've experimented with alternate data sources. A lot of that, like that are very disparate from this. A lot of them, we get a lot of data on people who never even use Zemaito and therefore never see the symbol. Never use Book My Show, never use Dunzo. And so if you're not on Zemaito, Dunzo, Big Basket or one of the merchant symbol is on, it doesn't matter how great or how bad we think you are, you never see the button and you never have the opportunity to use it. So there are some scenarios where people use to change their identity, right? They deposit and they cook a different number, change the name and they, again, they do the transaction over some platform. This is like, how do you track that kind of distance the existing user and the changing that? Um, I mean, I can't really say because then they might figure a way, someone might be listening. Yes. Okay, so the general rule is you should not push them too hard on the trade secrets, but generics are okay. The thing is, it's not that we are not willing to share with the community, but if we say something, then people who are, you know, want to do that kind of things, they will know the way to hack the system. So that's the only thing. Okay. There's meant, there's hundreds of thousands of people out there who really want to steal our money. So we have to be a little quiet about these things. Yes. Okay, fair enough. So let's come back to the hard one, which is, do you employ collection agents? We have our own people who call up and say, please pay your bill and you get some spam messages, but we don't like send anyone to your house to take your furniture if you don't pay. Okay. Although I think some of the merchants we're going live with involve rentals, so they may wind up taking your furniture if you don't pay. Okay. Or your TV or whatever it is they rent. Okay. I don't know exactly how they work if you don't pay the bill there, but it's all in the lies. Fair enough. But you say trust in social used to do credit scores, right? But are you also trying to go into the model where simple is going where you're just putting your own money and learning? So the way that we work so far is partner with the financial institutions in terms of credit scoring and we get some share from the revenue that they got. But in terms of the vision of the company, we are trying to push our own lending platform. Okay. So eventually all companies which built their own credit score models which are superior to FICO do their own lending? Is it potentially where the money is? Okay. All right. This is a provocative question that I have. I've been learning for five to ten years, big data for a while, then to remain by the time you get a job you're going to receive a nice credit scoring. It's still, there is this paradigm FICO said that the banks of how they operate have changed drastically. Wine hasn't actually come sooner, but as far as I can tell there's no company that's nailed it, that's convincingly used between 90 to credit scoring and we're all going to have to do it. I mean there's quite a few companies that have done that. So Lending Club, Klarna, Afterpay, WePay, those are some billion dollar companies that do this. I am not 100% sure if the WePay lending line is bigger than a billion, but it's close. There's several more in Latin America whose names I don't remember off the top of my head. So it certainly is out there. Also what happened is a lot of consumer unsecured bioalternative data has started actually going through banks. So Goldman Sachs discovered there was a ton of money in this and now banks do it. And that's what Goldman Sachs Marcus is about. Goldman Sachs literally never ran consumer banking before. Then they discovered how much money there actually is to be made by doing this and now suddenly it's a different brand, it's Marcus, but they do it. So I would say FICO is still there because FICO works. Like no doubt about it, it is a great way to lend money that works well. So people aren't going to throw away from it that works. But at the same time, the alternate stuff is covering use cases that FICO does not. And as far as I can tell it is working. It feels like just, you know, what's example is those are niche cases where people believe in their own model and manage a spin because they have specific signals like in your case or your reduced case. How, what I'm struggling with is that you still have like the FICO paradigm and you said Brody works and there are people who are relying on it for general generic credit needs. Why haven't we seen like a more grand square of credits for it that's become its own paradigm? And not just for the niche use cases like our lending club, et cetera. I'm not sure exactly what you mean by niche. I mean, is there like, is there a double the difference in the market now? That's because the niche is on the first one where there are specific use cases that are not generic. So I think your question is about like, why can't we have a generic kind of untalented credit score that works on own kind of problems, right? Is that your question? I think this go back to the root of machine learning. Unlike a person, if you hire a person and assess the law to approve the law or not, then the person can work in almost all of the law that you brought up, right? But machine learning model is in different way. It needs the training data. And normally the model can only work with the new data that's quite similar to your training data. So if your training data is, let's say, on a great loan, then it may not be gonna work in cash loan, for example. So that's why we have to build particular models for different kind of problems here. I would also suggest that if Imagine FICO had not come first, there would be a need for a thing like that. So suppose there are machine learning models that are currently used for alternative cases that are also as good as FICO. If FICO comes first, probably no one's gonna switch to it unless it has some dramatic advantage. So there's sort of a first mover advantage. If FICO is there and you're as good as FICO, I'm just gonna stick with FICO and accept it in those cases where FICO doesn't work. So that plays a role as well. And like I said, FICO is actually pretty good. It's hard to beat. So when you have that level of data, it's even much easier to beat when the data is missing. And that might be also why the alternative stuff is only used when FICO is unavailable. But also like to squeeze out basis points, many banks are in fact incorporating alternate data on top of FICO and traditional credit histories. Another thing that is a little bit tricky is that in the U.S. there are significant regulatory risks to doing this. So specifically there's a sort of law that's related to fairness and lending. So I don't know how many people here are familiar with American racist history. So the thing is if you actually like where to plot delinquency rate versus FICO score and you do this for each individual rate. So you have Asian people, you have black people, you have white people, et cetera. You discover significant differences between these curves. And so that's a signal that you could use. Now a machine learning model might accidentally learn this signal, which is actually illegal to use. And in fact they do. Like my first oriental lending was this project where I was just trying to teach these guys how to do an American Scala. But then I noticed, wait, I can make this model a whole bunch better. And then I'm like, look, I made the accuracy go up by like 8%, look how much more money you could make. And they're like, yeah, the regulators will never let us do this. What, you were using ethnicity? No, I was using some location stuff and a couple of other things that happened to be correlated with it. It was basically a machine learning model that just happened to pick up proportions of ethnicity in certain regions and a couple of other clues. Indirectly that was still an issue, that's interesting. So you were approaching ethnicity? I mean, I had a very nonparametric model. I didn't know what it was predicting. I just noticed, look, there are these patterns. This thing seems to detect them. And like it's like you train a neural network and suddenly it picks out ethnicity from locations and which stores people shop in. I mean, it's certainly not with anything like 100% accuracy, but like 70% is enough to squeeze out more money. So they told me the regulators would just not allow this at all. Is it no go? So that also prevents some of this from happening at least in the US. I don't know what happened in Europe. So could you say more about and without, sorry, I was just one, I'm not just saying that. Yeah, yeah. So I mean, if you lend out to, I mean if you only lend out to people who are Picosco, who use the money that comes, they pay you less. So if you want to earn that revenue, you need to send out to people that you are Picosco. So how do you optimize that? Sorry, say it again. So if you lend out to, I mean people with only high Picosco, you lose all the revenue which you get in terms of your interest. You get that revenue you need to lend out to people with lesser Picosco's also. So how do you optimize that? Or do you really optimize that? So ultimately there should not be, like if you're running a good loan portfolio, it shouldn't be higher in, you shouldn't be making more money off one cohort than the other. So let's say I'm loaning to one cohort with a 5% delinquency rate. I have to charge about 5.3% interest just to break even, factoring in the cost of running an operation and making a profit, I'm probably charging like seven, eight percent. And that gives me about a 3% profit margin. I mean minus the actual expenses of hiring people in collections and so on. Now I want to lend to a group of the 10% delinquency rate. I will probably have to charge like 12, 13% interest and I'm still making that 3%. What's your gene interest rate based on the? Based on the. Segment you are. Yes. And so ultimately there's use, unless a segment is underserved and you can charge excess interest rates, which is currently what's happening in consumer unsecured right now in the US and Europe but that is as more people enter the market, they're just offering lower interest rates to compete and it'll get back to the same as everything else. Anyone else? Yeah. I'm just curious, is there any study that's actually managed to break its cost and say it's not good to have to take an extra set to pay 5% of the fee? Is there some kind of a standardized consumer brand that actually manages this? There are actually less academic studies on this than you would expect and they're mostly interested in something else. Like that validation is incidental but at the same time, all the lenders that use this and don't go under is kind of the skin in the game test of whether it actually works. Well the idea is absolutely extraordinary. They're just giving to people who are pushed up higher than maybe actually getting rid of people who may not pay the pay in that way. But if I'm wrong, I just lost money. This is one of the things I really like about working at Simple. Like I can ship code and if it doesn't work, damn, I just lost a bunch of money and if it works, the company made money and either way, like the proof is in the pudding. Yeah, please stop evidencing on the roadside of the code for this time. Who actually put it in? Is there any link or who may be there? Yeah, so most validation datasets will include this. So I don't know exactly what you mean by a strict A-B test. Yeah. What is a strict A-B test here? So it will like, I know of the people of Khandedkeya, Bessdhan, Korangeshpur and I was supposed to send them the landing. I would have to go ahead and see how many of them are working on that. Oh yeah, I'm doing that right now. Like I do that all the time just to see. So you'll be using the interaction history. Like, if you're using, you'll be using, let's say, a large number of people. Recently, this recent history might be more positive. But maybe because the orders are using the old history and maybe not just like giving you the latest reason, it might be actually not making it to some users who have now come up with an effect for now in a better situation, they might have a missile problem. So is that a situation that you're finding? I mean, now we're kind of getting into things we can't say that much about. But you adjust the scope based on, you gradually adjust the scope back to one type. You can say you do that setting as it is. Sorry, what are you asking? No, that's a very high level point of what he asked. If you find a scope for a person at a point of time, do you reevaluate based on his pattern set? Oh yeah, yeah, we reevaluate almost real time. Okay, just one more question before we move to the financial information. Our question is that in terms of portability and credit assessment, so do you see these as different items or do you kind of develop a portability as part of credit risk assessment? Just want to get your thoughts on portability versus someone who has a very high, you know, very old school, how do you kind of handle the room amount and what's right for you to see? I mean, it's simple, we're not lending enough money that this really matters. Like we're mostly your movie tickets and your lunch. So it's probably affordable to you. Do you see anything to add from what you're doing? So I think even though we are not in the micro-landing at the moment as a symbol, but our product is still kind of small compared to let's say a car or a mortgage home loan, then affordability is not a very big deal here, I think. Yeah, but I mean, I know that people who do things like car loans and home mortgages, they very much look at debt to income, stability of income, that kind of thing. So if you earn a lack of month and you want to buy a five-core house, no matter how good your credit score is, they're just going to say no. Okay, so let's move to the second part, which is on the financial inclusion thing. You've done a lot of work on the alternate stuff because you said there is a lot of unbanked community in the Southeast Asia. Can you explain a bit more on that? So basically, they're about, as far as I remember, they are about 1.7 billion people that haven't got any banking history or credit history. So basically, for those people, five-core score doesn't work, right? So what we do is we use annotated sources of data and instead of, for those people, instead of they are rejected by the bank or they are given by, along with a very, very high interest rate, we distinguish them, distinguish the wood customers from the bad ones and we can still help the bank to give a better loan to those customers. That's how alternative credit score working. We are not trying to replace FICO as Chris said. FICO is still working very well in some certain applications, let's say car loans or home loans, but in terms of smaller loans like a few thousand, for example, I mean, dollars, then our problem is try to help in some certain groups of people that haven't got any banking history. Okay, so it would be more accurate to say that the FICO, the target segment of the population that does have FICO scores and the one that you try to do is complementary or exclusive? It's a bit, I think it's complementary. Because what FICO does is, they will give you score anyway. If you don't have any banking history, they still give you some score. But it's normally very, very low and the bank will not approve your loan or will give you a loan with a very high interest rate because they don't know who is the wood one and who is the bad one in that group of customer. Okay, so we have a lot of usury laws in India which prohibit lending beyond a certain percentage. Okay, do you have such laws in Vietnam in sort of issue? I think for, I'm not sure about the Western countries, but I think most of the Asian countries are gonna face that the same problem. So basically for some groups of people, especially like I wanna put in the context of Vietnam, there are some certain groups of people who are running short of money all the time and they need some cash flow to recover to their business. And basically the bank are not gonna give them the loan because they cannot, basically the bank do not have enough information to assess the credit worthiness of those people. So basically they have to run into the, I can say it's a kind of black market where they borrow money from some people and those people they are willing to give you a loan without any property or anything to assure for the loan. But the interest rate is very high. Let's say they charge you three or five percent per day, something like that. And if you don't pay the loan, if you don't pay that amount, that interest gonna become the principal next day. So the problem is very, very, and it cause a lot of, it's like people will be trapped in debt and they can't pay back the loan anyway. So Vietnam government they try to have this kind of issue so they want to push the bank to give the loan to those people. But they haven't got a clear policy on how that is implemented yet. Okay. So what online sources of data that you use for these people? So I think it's running to wrong that I can't review much. But in Tresing Soso, we use Telco data and other sources of data like social media, anything that we can get from the customer online and we also have our own platform, our own app, where let's say if you want to have a loan from our company, you install an app on your phone and we are getting consent from you that we gonna collect a few items of data from your phone usage and we can use that to predict your score. Okay. How reliable are the predictors? Better than zero or? Yeah, so in terms of performance, I think so far in the markets that we are working on, the banks are giving us high good feedback in terms of performance, at least better than what they're doing if they don't have our score. Okay. So you have a feedback loop about the loan performance with pure lending institutions? Yes, we have to run this back test dynamically with the banks and to, you know, to assess the performance of our price scores. So when we start about the machine learning cold start problem, okay, you solve that by basically getting data initially and prepare to write across it in the amount of losses. So once you get to that problem and you start using all the data sources, then you need for the model to perform better, you need feedback from the unwriting institutions about how the loan is worked. Right, that's how you get better? Yeah, it's kind of our loop. So we first, we have to do the cold start and we obtain some training data. And then we, based on our prediction model, we give the loan to some sudden customers and then we have obtained feedback in terms of loan performance and we use that as an extra data to improve our model. When you do that, it's the model that you underwrite the leading institution or you are just providing the data for these two, those leading institutions what's the model use? So for the current business model because we partner with the FI, so we just act as a provider of the score and finance institution will decide whether they approve the loan or not. But in our new platform that we just launched in Indonesia, we will decide whether we approve the loan because that is our, basically our planning platform. Any other questions? Keep asking. So let's say we have a one to tender data sources. Now the problem arises that for some customers we have one to three different data sources. For some customers we have five, six, seven. For some customers we have two, eight, nine. Now, how to know the combined decision or how to make a model because you can't make a model one or the combination combination. So how you use all this and different alternative data to gather to move the plan into the business. So I think now we have some models that are by robots in terms of dealing with missing data. So basically if you have that kind of situation I think we can still use all 10 data sources and for some certain customers that miss some certain sources of data we accept that as a missing value. Something of one particular data source let's say predatory data source is so powerful that it won't let other data sources to come up more significant in the model that in that scenario. I think we have to cover that in our training data. Anyone else? Now just comment on that also. You can also, like one, this is just sort of a general machine learning tip I can describe it. If, let's say you build a model and it is dominated by one factor when you have that available. What you can do is train, just delete that factor, retrain the model, get something else with lower accuracy and then build an ensemble of the two models and this is typically a way to handle it. So you use one when it's available, the other one you still have to evaluate the thing as a whole, find it's rock AUC, find it's precision recall, whatever. So the point is when you build this combined model you still have to evaluate it the same way you would evaluate a single model but that is typically the way you would handle situations like this. Another thing is there's like imputers of various sorts in your favorite machine learning library. Go with the average if the value's NAN. So you can always do that kind of thing as well depending on how important it is or not. Fundamentally, any loan business is at risk in your loan and at risk within your interest component to assume it's being written off which is ultimately going to eat into your margin. And figures in the US have been worth like one to 3% on say an average of 18% interest rate. Now the question is to arrive at that in terms of how much of your capital is at risk, just knowing the default is one element of that, right? So you said default is non-payment in 90 days. That doesn't translate necessarily into how much recovery you make. And so the question is I think we've been focused on machine learning or data science for full prediction but are there other use cases in order to get a much tighter handle on that capital at risk beyond just nearly the full prediction? So there's a number of things you can predict. I mean one of which is like, so FICO is basically attempting to predict delinquent on at least one loan after 90 days. But there are other models. So when I was sort of describing the, like depending on what specific lending you're doing, how people will pay off their house first and their medical bills last because they don't want to lose their house whereas the hospital can't repossess their repaired heart. So typically these things get input into the model and so you make these adjustments you make to, let's say FICO are often based on exactly things like this. Additionally, there will be if you can additionally attempt to predict, let's say recovery after 180 days and if that is non-zero, you multiply it by the time value of your money. So all these games go into actually pricing a loan and people who are making much bigger and longer-term loans than I am will be much better at it than I am. Is it just default and then recovery expectation? Those two factors? No, it depends on the kind of loan. So in the US, if you're talking about mortgage lending, there are other risks like there's interest rate risk which is to say that you have let's say a 30 year loan with a certain interest rate. So it has a certain value, but if interest rates go up above then that reduces the value of your loan relative to just treasury bills that you might buy today. Another thing, so US has a really weird structure for home loans. It comes with an embedded option that you can actually pay it off all at once if you want. So this is called prepayment risk and what can happen is let's say if interest rates drop significantly, the borrower might take out a new loan to pay off the old loan and then suddenly they're paying lower interest rates and you get nothing. That's called prepayment risk. That is another major factor that goes into home lending in the US. I don't believe it works quite the same way here. Prepayment works here. Prepayment works? Yeah. You get the embedded option, but durations are also typically not 30 years, right? Yeah, 30 is unheard of, but 20, 25, we usually give it. But there is also a charge seeing that if you prepay with a certain thing, you're at a loss. There's a penalty, so that's all. That's all they normally do. So the penalty is probably carefully calculated and it's basically what you would lose. Yeah. In the US, you don't actually have those penalties. But there's also balloons here, right? Yeah. Okay. So the other question that I have for you is, are you using what kind of inputs? Is it like a single factor that has predictive values, like location, or is it like you need a lot of factors to come and say this is good enough? So again, I can't review much, but from our data, we can extract up to 10,000 features and we run like a process of machine learning to reduce to about 100 or 200 factors that are most useful. So basically, if you ask the question that whether we can decide which factor will affect the outcomes of the grid scoring, then I think I can't give you an answer because machine learning models sometimes work as a black box and there are many factors inside that interact with each other that we can't actually give you a clear answer that which factor will lead to the result. Okay. The reason why I ask that is because when you start off, right, you start with heuristics, you just keep improving upon it and at some point of time, your heuristics still work or it doesn't? So I think in terms of our situation, sometimes prediction with a very good performance is not a good choice because in that case, you can come up with the overfitting problem with your training data. So we try to balance between our prediction performance and the performance in the real long that we are taking with the bank. Okay. Any other questions on alternate resources in models? Can you explain the last point? So we are running into some situation where if you train a model, even though you don't see the test data, you sometimes you tune the model and you can gain a better performance on the test data, but you unconsciously tune the model that it overfit to your test data, even though you don't fit your test data into the training process. So that's why I'm saying that we can run into the problem of overfitting even to the test data. What if we take another test data and check again? Yeah. We have to figure out how to balance that. How do you do that? How do you know you're overfitting or not? I think I can't remorse, but we have to run backtesting very frequently. Another question I have, can you do this AIML techniques? First time you don't know what your features are, you're going to lose your approaches to find what they are, but as you get more data, if you rebuild the model completely, your features will change. So what kind of approach do you take? Are you okay to shift your parameters or move it in a way that's incremental? And how do you rebuild the entire model? Yeah, I think we have to accept the fact that if we train a model and then we use that like for a long time, it's not going to work because our data are going to change and the way the customer behaves and the way that I evolve is different. We will change differently. So we have to train the model, we have to fit it again, and we have to do that regularly. Probably six months? And when you redo the model, you will still consider your entire history or you will throw away some of the old data? We have to balance between how much you cover and the performance of your model. I mean, we train much more frequently. Basically, the simple cycle is you pay the bill twice a month. So we basically train that frequently. One other thing I would mention is that at least for us, there is kind of an adversarial nature to this as well. So essentially, there's, first of all, ordinary underwriting we're doing to try to determine if you personally, this real person sitting in front of me are likely to pay your bill as a good credit risk. And there is separately the fact that you're not actually a person sitting in front of me, you're a person out there in the world sending signals to my server somewhere. You might be shocked to discover that there are people out there who will just attempt to do this over and over again, take out a thousand bucks, take out another thousand bucks, try to get a bunch of free food, that kind of thing. So another aspect of all of this is we have to, whatever we're building, we also have to just be careful that some human won't be able to come up with an adversarial attack on it. And then, oh, where did that lack go? What's your benchmark for the GD and the AUC that you might use? For the what? For the metrics that you might be using for the case where you want this. What metrics do you use for the case where you want this? So it's some combination of rock AUC, we also just focus on the num, I'll take lower rock AUC if I can accurately get also calibration and also since we're about, like since it directly affects our growth it's finding how many people are actually in the good set. So I'll lower my rock AUC if I just get more people that are definitely less than 10% or 20% risk. I see yours more like a lending business, a credit card business. Yes. Twice-number payment model. Yes. So do you have a manufacturing ecosystem supporting you? Do you have it in-house or is it? It's all in-house. All in-house. Yeah. Okay. So how big is, what is the, I can ask all, what is the size of the data you work with? How big is it? I don't think we've publicized that. Okay. How does the regulatory, it's hard to get it done. So we're in a weird regulatory position but I don't know the full details of it. We are not an NBSC, we are. We are technically lending off another NBSC's book but I believe it's not exactly a loan. It's, I don't, I'm not a lawyer and I don't know those specific legal details. So, sorry. Which markets do you serve other than India? Just India, right now. The same questions about you? Is it Vietnam? So Tristan Sosa started our first service in Vietnam and we have standard to Indonesia and now we span to India and there are a few other countries that we are heading to. And yeah. Can you help us understand your name is Custin Social if you have a great relationship with them? So, we, as I said, our whole point is using alternative sources of data to do credit scoring. So the way we do is we use social, social data as the alternative source of data. So that's why we, and we use that to do the credit score and that is something like trustworthy of the person, right? So that's why we named the company as a trusting social. But by social you mean based on what's that kind of thing or any other data? Any other data that is considered as social. Let's say you, your friends or anything can be called as social, right? It's not only social media. Sure. So you guys are like different models for predicting default and like for the accounts and once you are signed up, whether it's limited or decreased. Sorry, can you repeat the question? Like different models for predicting defaults versus like once you are signed up, decrease their limits or give them benefits. So at the moment we consider it's home, sorry, it's long application separately. So we assess that credit score when they make the application. So don't you increase the limit once someone comes on board? Don't you get the chance to do this free? Don't you go ahead and increase their limits? So because we do not provide landing platform at the moment, so we are not running into the business of spending like increasing the limit. The bank, the financial institution will do that. Our question was do you have such policies in mind or do you think that's how that should apply? I think that applied to Simba. You're right. Okay, we do give credit bumps. Typically if you pay your bill on time regularly and also you look like you don't have enough credit. So like if you run out of credit on day five of the cycle, the 15-day cycle. So if you run out of credit a lot on day five, probably a bump is in your future. Assuming also our model predicts you're likely to pay the bill after you get the bump. It's model-based. So it's model-based predicting whether you are likely to pay the bill and then, I mean I can call it a model, but it's based like in terms of whether, so there's two questions. First of all, if I give you a bump, are you gonna pay the bill? And the second bit is do you even need it? If you have a 5,000 rupee limit and you're spending 600 bucks a month, why would I give you a bump? That's the risk with and you're not gonna actually transact, so why bother? So that is a really simple model. It's basically just sort of, are you really spending anything close to your limit? And if you are, sure, it will give you a small bump and if you spend close to the new limit, you might get another one. What is the documentation you expect to extend credit line? Alar? Nothing. I will ask you like your email address once you install the app and that kind of thing. And then what is the typical credit line like? Typical credits can be a few thousand bucks. Okay, it's not differentiated based on the credibility. It's everybody's economy, 2000. No, so it'll start, it'll be, so basically we're sort of a longer term relationship, so we'll start you off, probably not very high and then as you pay your bill and as you start to hit your limit or come close to it, then we'll start giving you bumps. So you are not born by RBI rules and regulations? You run a backing system, technically. It's... I don't, honestly, I'm the wrong person to answer about the specific regulations. I know we have lawyers and every so often they say you can't actually do that and I don't do it. So in credit list model, previously banks were using realistic regulations and comes the age of gradient boosting. Now, what is the current age algorithm which you guys use in credit list model? So I think I can't actually name any specific model that we're using. Algorithm, right? Yeah, we try a lot of models and we have to figure out which model works in which situation and we can't name that specifically. What important is interpretability of the model that you are creating? So it's like some of them might be learning one of them might be in the black box, you know what's happening. It's just giving us a validation for this drug. But is interpretability important? Tomorrow it might not work and we have to get it out of the way. I think in our case, interpretability is not so important as long as our model works then we accept that. Yeah, I would tend to concur like rock AUC is something I can turn into money. If the rock AUC is not high enough, we lose money. Yeah. It's easy to bug the code, but on the other hand, like the interpretable models are rarely gonna perform as well as the complex black boxes. So. Why do regulations? If the customer comes back asking like, why I'm not able to do it? Customer has an invisible eye and suddenly he is now showing our image. So how do you provide it? Would you say they use code as integrated or it provides the exact results like five minutes back? We don't provide exact reasons. So come back to the question. So we probably can't do business in Europe, if not yet. From the experimentation perspective, you may not want the interpretability, but let's say going forward if you want to adapt a particular dimension, do you want to cut the model, see if the model actually works, it is actually not this way. No sensitivity to interpretability, right? For example, just to elaborate, let's say you want to probably, going forward, you want to lend only for, let's say, food business or something. And you know that something features will be the limit if I'm typically focusing only on the food thing. So you need to cut the model across some dimensions, or you see that small microscopic area. So to answer this question, I think I can refer to the interview that Jeff Hinton's father, he can be considered as the father of learning. In a recent interview, he said that if you can explain a model, then you won't actually need it because it's so simple. If it's that simple, then you don't need a machine learning model. I'll also sort of describe a little bit just sort of about how these things are often built in practice. So there's lots of pieces of it that are interpretable. Like for instance, there's sort of whole library of things we've seen fraudsters do and here's some code to detect when someone is doing that. So most of these are sort of interpretable because we know what the guy is trying to do and we know how to spot it. And it's relatively straightforward procedural code that basically looks for that specific pattern and fits a few grams. Okay, then these things feed into another model that is much less interpretable. So there's very often interpretable pieces of a model that then feed into a big black box that mixes everything together. So the interpretable bit might say, I think there's a 40% chance that this guy is doing this fraudish thing and then there might be another piece that is also fairly interpretable that is, I think there's also another 10% chance he's doing this other thing, but I'm not really sure. And then the big meta model also says, and also he's from a suspicious area and I wouldn't block him on the basis of any of these things by itself, but putting it all together, it tips him over the limit. So this is sort of how a lot of things are engineered in practice. And then typically most large models will have various sub-models that are dealing with one piece of the data. So like we might have a thing that sort of deals with, like we have one thing that just deals with email addresses. Some email addresses are more suspicious than others. And that's kind of interpretable, still a bit of a black box, but at least we know what goes in, what comes out and this is about this part of the data, this model's about this other part of the data and then they all feed into a much bigger thing at the end of the day. So if the government enforce some regulation that require you to expand the reason why you reject a law, for example, then I think sometimes we have to make up the reason. And it's not a real reason that we understand, but we have to make up to answer that. Okay. Specific examples on made-up reasons. Okay, right. So I had an example on 2009 when the city bank actually said that basically according to a dispute with city bank with a credit card score, and one of the hard problems that they had to explain to the government as part of the dispute was they basically linked some other persons identity with my credit scores. Okay. So the point is in the court I argued around it, they basically said it's a severe technical error and it went on for about half an hour. And at the end of the judge said, you know what, I don't think you guys understand what you're doing, right? And so I'm gonna award the case to him in five minutes if you are not able to explain it. Right. So it's in India. It's in India. It's my personal case which I argued before the judge. Okay, that was in 2009. And the side effect of that thing was that for three years I couldn't open any credit card or bank account because until that time, my credit goes locked as due and stuff like that. So, right? In some form, but I got the money back. I got the money back and I probably also made them pay half of my home loan as a mental thing and all stuff like that. So it got resolved. The key thing about interpretability is it's really not important for the FinTech companies to do it. Unless until there's a law which says that you shall see it, okay? And I think that is the part that is important. So that's why you're saying you can't do business in GDPR, EU area. Yes, yeah. Or put it this way, if we did business in a GDPR area, we would just have to charge much higher interest rates to deal with lower accuracy. Right? So that's the price of interpretability and why you're saying you're seeing this and there's not much you can do about it. It's either this or more interest rates. Yes, or similarly like we, in terms of the same trade off is there for privacy. If we take no data about you, we have to charge you the average of you and all the fraudsters that are also saying I want my privacy. And that might be a lot. That's the reason that life is, you also have a much key base in that thing. So life expectancy does not come much, figure much into that. Same thing to you as well. And I think a lot possibly I died. I mean, maybe you faked that. I don't know. I think that is too specific and it's too rare to be considered. So you basically, I think it's a question but to ensure the credit. Like when they sell credit card, they have an insurance for every credit for all the, for the amount of credit. They go for insurance policy for the amount of credit because lots of loans. So when you take a home loan or a car loan, you're supposed to ensure the amount through another insurance policy. And even for credit card, they do it in India. Whatever credit line they have. Yeah, it's not an individual one but a group policy kind of thing they do it. Let me ask you, your system has a smaller amount for a smaller duration. I don't know what is your duration and your amount. I think it depends on the law but probably 12 months or 10, 14 months, something like that. But if you ask me about those questions, I think I am not the right person to answer because I don't know much about the business, how the business is going. So. Okay. That's why you asked about the accessibility. Basically, for other data scientists, we don't care if it's independent. It works for the charm and whoever is paying wide works, it works for the, but generally you want to slice and dice and ask questions and all of these things. That's when you need the accessibility. No, I think the accessibility is very important or not important as long as they need the goals, which is like, I think you start by saying that this is the amount I will maximum loose or I will maximum gain or something around that. Yeah, we typically focus on like how much money we are likely to lose and I mean. As long as you lose much less than that, I think you don't really care. Like put it this way, the business guys always ask these questions and ask if we could give something, they don't use the word interpretability specifically, that's a technical term, but they want to know more about what's happening. I'm like, look, do you want, I can either give you less accuracy and we'll lose more money or you can just sort of accept or you can read this paper on gradient boosting or neural networks or whatever. I mean, ultimately at that point, that's the trade off and you explain that to them and they never read the paper on auto encoders. I mean, that's simple. I'm mostly also running that knob. For your case, is loan duration also something that is like depending on, I mean, it might not be possible to actually get back alone in a certain duration or a certain length of time. Is that something which also is like for each individual, you try to predict what duration this can be able to be back in the center? I think our model works in a bit like higher level. We're not building into that specifically but the model is like, we're not saying about like 50 or 100 different models, right? We're talking about like five, 10 different models for different products, but we're not building models, let's say predicting the repayment after 12 months or 24 months. We don't go to that specifically. But I can tell you that people who do issue longer duration loans do worry about these things. But mainly I just know this because my wife builds graphical tools for them to actually like understand this. So like for instance, there is a thing called credit score drift. Your credit score is predicting your delinquency over the next 90 days on some loan but also your credit score might just go down or up over your lifetime. And in a larger pool of loans it'll go both ways for different people. So there's typically sort of a random walk model to describe how that happens. You look at random walks, you also look at what happened in the past and you project. And typically what you'll do is you'll come up with nightmare scenario, you'll come up with good scenario, standard drift, standard drift plus more and then you sort of evaluate what comes out differently in each of these cases. That's the path of your credit score. But my question is this is that like whenever somebody, so there are a few ways you can prove this. You can say that okay, build this guy, like I'm giving him some hope and he'll be able to pay it back for a certain duration. Like you'll have something in mind like say 180 days or 6 days or 12 months or whatever. Or you can say just does he have a probability of paying it back in a large enough duration or something like this. Other way can be that okay, you fixed the loan amount and now you try to predict okay, what time will the user be able to pay it back given a certain historical data about the user. Is that something which is... Well those are the same thing it's just depending on whether you're, like when you draw the graph if you're going like seeing where the horizontal line slices, this way we're seeing where the vertical line slices or if the vertical line slices. This would be easier with a white board. Okay. But they're basically the same thing. It's just a matter of like you're looking, you're looking at if either horizontal line or a vertical line in the same graph and if we had a white board it would be easier to explain that. Okay. Next up over here if you want you can just take it but keep asking the questions. So typically what they'll share with us will be more like their gold, their platinum, their best customers because like what's simple is primarily is this convenient product. You push a button and your lunch is on the way you push a button and then you have your movie ticket. No OTP, no filling a wallet, nothing like that. So typically what they want to do is the customers who use their product a lot who will make like five, six transactions in 15 days. They want those guys to be on simple to remove the friction and make it go from six transactions to seven. So they'll basically say these guys are, this guy doesn't have a kitchen, he just orders his lunch all the time and then they'll just pass that to us. Yeah. Like if you use, if you go to a movie once every six months and use Book My Show and that's all the e-commerce you do simple is not a useful product for you just don't even bother. Whereas like for instance we're on some cafeteria merchants so it's a corporate cafeteria and you might go there for lunch and chai and sometimes dinner. That's a great simple use case you don't have to deal with reloading your wallet when you just want your lunch to come. You just press the button okay you take your tray. Do you do the alternate stuff even for companies and corporations or is it only for individuals? Sorry? The alternate credits core models do you do it for companies or private corporations or is it only for individuals? So at the moment I think we focus on individuals. Okay. I can tell you a big player actually in doing it for companies so in the SMB area is strike. Because if you think about it Stripe knows a great deal about your cash flow and they also have a great marketing platform so a thing Stripe does it's part of their business now small business lending Stripe knows your cash flow they have a rough idea of what's going on so when they decide you are eligible you log into Stripe and they're like do you have cash flow issues? Business equity loan subject to these terms collectible against your Stripe payments. So I also read about anti-financial in China so the way to do is they use data from their platform in Taobao for example and in Taobao you can have an other e-commerce of Alibaba so it includes both small business and individual so in that case they know how well your business is going like how many transactions you have done in the past one or two years and who are your customers and things like that and they know that you are doing well with your business and in that case they can approve your small loan like a few thousand US dollars And for you to predict cash flow you also need to know the expense side of the stuff for a compute So Stripe doesn't know everything but they also know some of it because if you're paying from the account that Stripe knows about to another Stripe account they don't know everything but at the very least they know a chunk of your revenues and I'm sure they buy data as well Like the thing I know exists I don't know exactly how they do it but I do know they have this extra data point that no one else has So going back to the rules I'm just curious if you know how that you just in the course of it we have the problem of the typical exploration expectation some various explore and see what kind of people can do and somewhere you use the database but have you guys researched about what you think because we're learning and getting some experience At the moment in our product we haven't used but that is one of the direction that we are working on I'll also suggest that generally speaking the people you learn most about you learn the most from are the marginal people the marginal ones So if you have a score let's say one to ten and you find a good risk cutoff is three so let the higher is worse probably you want to approve like a nice little holdout set at four and a much smaller holdout set at five probably going all the way to nine is just a waste and then what will happen is like you do this for a little bit you discover actually it's pretty safe to go to four you repeat it from at five and six and you do this until it blows up Okay, six thirty we have another five minutes or so you can keep asking questions I would like to avoid fraudsters, yes I'm trying to prevent them from getting in beyond that I can't say much I can't see why All I can describe is the mindset with a typical machine learning problem if you're trying to differentiate cats from dogs you really just focus on accuracy if you are trying to deal with an intelligent adversary you also have to think about and in this thing I built if I let's say know how this works can I hack it so it's a mindset you have to get into and sometimes you'll discover yeah this seems to work great but someone could scam me this way so I'm just not going to switch it on or I'll switch it on but in a limited form in the minute someone figures it out it's off but that's a mindset it's a mindset you have to get into it's half computer security half machine learning Do you employ adversaries? Yeah we try to hack ourselves we have we have a bug bounty so if you figure out how to scam simple we'll pay you you also get an automatic job interview and we'll try to recruit you if you can figure out how to do it yeah there's a kid out in Jaipur who got some really nice bug bounties for sitting at home playing with us One thing I want to add about this is the way we consider machine learning we instead of consider machine learning at fully automation system we better combine whatever machine learning work well and whatever human work well so we have to find a balance between the two and also just like you have to be really creative when you're trying to think this stuff up like here's an attack someone might make they order groceries if they figure out sort of a way into the system they'll order more groceries than any human can consume and they're turning around and selling them in a discount like this is a thing we actually discovered someone doing this but if they can do it over and over again it's just amazing how much groceries are going to this one guy's house so like these kinds of things happen and yeah if you're just like oh yeah I made the numbers go up and I haven't even thought about exactly how to scan this you're going to get scanned and the other thing is the numbers are going to look great great great where did all that go yeah that's the black swan problem it's not such a black swan it's just one asshole yeah that's the black swan we are actually sort of building so we track it internally we keep them out there's some so we're not actually part of Sybil so it doesn't get reported to Sybil but there's also some alternative reporting systems that are basically getting started so basically other alternative lenders may also not like you in the future what about you how do you track and manage fraud if we are talking about the current business model then that is basically part of the responsibility of the bank and of course we take part in that as well but like at the moment it's like we have to work with the bank on how to figure out those fraud do you actually help the bank in figuring out the fraudster yep yep of course let's say as we said we want to get rid of the fraudster the fraudster right so before we advise the swiss call to the bank and if our fraud detection model predict that these guys are fraudster then we just reject them okay yeah and the fact that you don't have to offer an explanation makes it easier yes I think it's that's why we work easier in Asian countries I think we have to find a way to work in Europe or the US additionally like a lot of fraud signals are not going to be anything close to perfectly there's a lot of things that are like a very strong prediction yeah this guy's 50-50 chances of fraud now obviously 50-50 is not a credit risk you can take particularly when the guy's going to just keep doing it and he'll make your entire portfolio go to 50-50 but at the same time like that's certainly not going to hold up in a court of law that this guy's 50% chances of fraud still okay and if you don't block those people the whole network shuts down any other questions that's questions do questions alright we're almost out of time thank you everyone right I hope you caught most of it in 90 minutes yeah thank you thank you