 So, good afternoon everyone. My name is Arpit and I am the director of data science for Zoomcar India. So today we will be talking about a very unique use case about car damage detection that is one of our core problems at Zoomcar we face every day, right? So just a quick check, how many of you have heard about Zoomcar before? How many of you have used Zoomcar before? Nice, good to see that. So then I do not have to spend much time on detailing out the problem context. Let us go through the agenda. We will be talking about car damages that is a day-to-day use case for us. We understand what is the problem, how big is the problem and then how are the, we will talk about the internal processes we have set up in order to solve this problem at scale and then we will be talking about how we have deployed machine learning models and how AWS SageMaker is helping us doing that and then at the end we will be talking about the challenges that are still unsolved and the way forward to solve them. So the problem statement, so we make around 400, 4000 trips every day and out of those 6% of trips meet with an accident. Now these accidents are like severe accidents where you have from a dent to like complete breakdown of the car, these are severe ones. On top of that they are equal number of incidents where you have minor scratches, major scratches, multiple scratches on the car which all need to be detected and treated for before the next trip begins. So in total I would say there are around 500 cars that go under damage every day and that is a different parts of India, different hooks and corners. Then because, why does it happen? Because we have a completely manless service, you do not have, you will not find a fleet anymore who would assist you in picking up the car because you can open, lock and unlock the car through an app. So that means that there is no one to monitor whether damage is happening, who has caused the damage, whether we should rectify it or not. So how do we get to know the damage then? So when the next customer comes to pick up the car, we have a checklist where they mark the car is already damaged. That is how we get to know or we have a general inspection after every two bookings. So that is another source we get to know. Now once we get to know the damage, they are two parts to it. One is about making sure that the car is repaired on time, second is also to attribute the damage to someone. So in our internal system, there is an upper liability of 10,000 if you cause an accident but there is some amount that gets attributed to the previous customer and they generally pay for that. So this is the problem statement that we will be working upon. Now what did we change as a process? To even solve this problem, the basic thing you need is an image. So recently around few, four, five months back, we have a checklist module that you had to fill at the start of every trip where you have to capture image at that point of time of the car from all the four sides. So that we get good enough database of images to do any type of processing on top of that. Now initially we started this process so that every time we put a charge, we have a verification for that that you know this is a damage being caused and that is why you are charged for it and that the same damage gets passed on to the city operations team so that they can send the car to the workshop to get it rectified. What was still not solved is that still this damage attribution is being done manually. Now imagine there are 5000 trips every day, every trip has at least 20 images of car, right? So around a lack of images are there already. Now how can someone do this manually? It's not possible. There are a lot of things that will go unmonitored, there are a lot of damages that won't be repaired for and that is even what we see in our customer feedback, we get complaints that the car has not in good shape, right? Because these are problems you work of when you go at scale, right? So how we have done is we have right now partnered with the AWS team who is helping us deploy this deep learning model of image detection and I like to invite Vinay to speak about it how they are executing this. Hello. Check, check. Am I audible? Okay, cool. Thanks, Arpit. So like Arpit mentioned, right, before that, so I am Vinay, I am part of the solution architect team and I have been with AWS for around 2 years now. So I will be briefly talking about what we have built so far and what are the results we are solving in terms of this particular use case. So let's go forward, right? So Arpit said the context on what we are actually trying to do, we are basically trying to assess the car damage itself, right? So how we have done it, the tech part is basically the first part is the data itself, right? So what we have done is we have trained a model and the data sets itself is basically of this size. So there are a lot more images that Zoom car can provide, but for the initial iteration this is the amount of data we have. So the data has been sourced from there. You can have a look at it. If you want to get access to the data, all you have to do is fill out a form at that particular link and you will get access to that data itself. So if you look at the problem itself, right? Even though the simple problem statement is that detect damage, when you try to break it down into multiple components, there are three main problems that you will have to solve. One is from the image, figure out whether it's actually the image of a car itself. People can randomly upload images, right? It could be interiors, it could be, you know, without the car in focus, etc. So that's the first problem. The other problem is damage validation, whether there is any damage or not, right? The last problem will be what's the severity of the damage? Obviously Zoom car would not want to go after somebody who has, you know, just a minor scratch or something, right? So that kind of problems. So two of them are binary classification, the last one being what you call multi-class detection, right? So for each of this, for car validation itself, the dataset sourcing is pretty simple. Like you can pull it off anywhere, just Google search results itself is fine. But for the other two, we needed images which have actual damage in place, right? So these are the number of images that we have. And if you, yeah, so this is actually a demo that is up. We'll give you the link for this at the end. You can write out yourself. So that application itself for the extremely high-level overview is basically this. The actual application sits behind an API gateway. And whenever you upload an image, it talks to a Lambda function. And then it talks to something called a SageMaker to figure out these results and then gets back with the result itself, right? That's a very high-level overview. Just a quick check. How many of you guys have heard or used SageMaker before on AWS? Okay, cool. So just to set a bit of context, SageMaker is basically a platform for end-to-end machine learning. Now that's a very broad statement. What it basically means is that SageMaker itself is not just one service. We have multiple components within SageMaker which will help you deal with different stages of machine learning. We have a service called Ground Truth for data annotation. For training, we provide managed compute. And as well as for deployment of the train model, we provide managed compute. So all of that is available under a single umbrella, which is the SageMaker platform, right? So how did we actually arrive at that web application? So what we did was initially the training data resides on S3. All those 1900, 900 images that I talked about. And then the training code, which is based on MXNet, image classification algorithm actually sits on ECR, which is nothing but Docker registry. So we have containerized that particular algorithm so that anybody can pull it out and start using it, right? The only prerequisite is that if you directly use that, the data has to sit on S3. That's only prerequisite. So once that is done, we pull in the training code as well as the data set from the S3 bucket into EC2. Again, you don't have to do this step yourself. If you just point the model towards the data set, it will pull it itself, right? Pull it by itself, right? So once that is done, the training itself can be done on any run-of-the-mill EC2 instance, no specific requirements as such. So once the training is completed, we bring up something called as model artifacts. The model artifacts are nothing but your params file and your hyperparameter details that was used for the training, et cetera, right? So all of this will be dumped into another S3 bucket, which will be used for the actual inference. So the next step would be inference. What we do is within the SageMaker platform again, you don't have to do this manually. We pull in the model artifact from S3 and we have a wrapper around it, something like a serving API, and we host it for you. And we host it for you using something called as inference endpoint. What it basically means is that we put a load balancer in front of a bunch of deployed model artifacts so that the number of EC2 instances scale automatically. So depending on the number of inference requests that are coming in, we automatically scale the, what do you call it, back end infrastructure. So all of this will be managed for you, right? That entire platform is the SageMaker platform basically, right? So that's the flow. So just to give you an example on what do you call it, the results that we have achieved so far, it's basically, yeah. So this is pretty typical image, right? Somebody takes a image and uploads it. So the first check is whether it's a car or not, detects that it's a car. The second check would be whether there is any, you know, damage as such, which is also detected. And then we are also able to tell where exactly is the damage and how severe the damage is. So that is one example. Sorry. Yeah, there's one more example, right? So this is much more minor damage, which is harder to detect, but it still be able to detect these kind of scratches, et cetera. Again, using the same model itself, right? So that's the technical part of how it's done. Feel free to give it a shot yourself for the entire outline for this is there on SageMaker, image classification algorithm. Feel free to give it a try. With that, I'll hand it back to you. So if you see that the model works pretty well, but there are some challenges, of course. So I'll take you through the challenges now. Now you see the challenge here. 50% or more of a bookings happen in the night, starting around 8 PM, right? Now it means that inherently, there are no good lighting around and the quality of image cannot be very high. Either it can, someone can use a flash or it'll be very dim light. And especially when someone does a damage, they will like to hide that as much as possible. So even there is light, they'll take a car to position where they, we are not able to figure out a damage. So these are some of the practical challenges that none of the models can solve for as of now. The second challenge, inappropriate images. So if you see the first image, the person has used too much of flash, really not his problem, but it does that, these kind of images are not helpful for us. The second image is taken right from a very close proximity. It means that we do not get to know which part of the car it is. Neither do we get to know the complete image of one side of the car. And if you see the third image, there's a tree in between. Now what it means is if there's a damage at the part where we just covered by the tree, we won't be able to detect that. Let's see the third one. If you see the first image, this is actually pretty dirty car. If it was a bigger image, you could make it out. So how do you distinguish whether the car is dirty or it's a scratch, right? It's a pretty difficult problem. This is one problem. Second is we also take images of car interiors because we want the car interiors also to be clean. So one of the other problems we will be solving now is whether the car is clean or not using image detection only. So then how do you solve for use cases which are not very clear, which are from inside of the car? So when we ran the model with Vinay, we figured out that images which were taking properly in good daylight, they are being solved for perfectly, but there are another 50, 60% of the images which are like these, which cannot be solved for immediately. Until unless you have very good annotated data that is put in the training, or there's a lot of pre-processing done on the images before it gets into the model. So what we have to do in future is there are three steps. First is the process of capturing images. So we'll be making image capturing process a little more stringent. There'll be more guidelines around how to capture the image, what part it should cover, and there'll be AI enabled at the time when you'll capture the image. So as soon as you capture the image, you'll get to know whether the image that you have captured is within our policy limits or not. If it is, for example, not a car itself or it is too close, then it should tell the customer in real time. Then only it can course correct, right? So second thing is that we have to train the model correctly, but what we realize is we have to take a hybrid approach to it. We cannot automate it 100%. So we will try to pre-process the images as much as possible, but there'll be some percentage of images that have to go through manual verification, and the other part will have to go through the training data. So there'll be a model before it gets into this model that will identify whether the image is good enough to be detected for a damage or not, right? So we are planning that we'll have something like 30, 70% 30% manual and 70% automated, right? So in future, what I foresee is going to be a hybrid model, right? And third is that we cannot really get away from the fact that a lot of trips start in the evening. So we somehow have to put in more data that is taken in the night, and we have to train a model to even detect images when the lights are low. So that again requires a lot of pre-processing, right? From a data point of view, right? So these are some of the steps we are taking. We are still kind of like work in progress. It's a big project for us, right? And this is all about our talk. We are open to questions if you have any, please. Yeah, before you ask the questions, these are all the details of the algorithms being used. If you want to go into details, how we have done it, there's a demo also on the first link, right? And this is my contact. You can reach out to me. Sorry, could you repeat that? So right now it is not equipped to do that. In fact, that has been a very common ask in another segment, which is basically Fintech, where it requires that their agents basically be in-person and do the KYC, which is basically verifying their photos and what do you call documents, et cetera, right? But as of now, it doesn't solve for it, but it's certainly used case that we have come across earlier as well. So that is, it's work in progress. Yeah. No, we also, like to make a point, if I have a rainbow light from this video, it's all in the dark, it's the same. Yeah, so we haven't thought about that to be frank, but interesting, I think a lot of people are talking about very interesting things that will help us improve further. This is how we crowd source. But I think what we can do is we can also, we take the image of the back of the car where we have the number of the car. Also, what we have to do is then extract the text from that image of the license number and match it with the car on which has gone to the booking. And if that is correct, yeah, but yeah. So yes, so now we'll take that into consideration. Yeah, that's correct. Yeah, yeah, it won't solve it completely. So we have to think about other things also, how to do it. One thing is that when we ask people to take the image, they cannot upload it from the gallery. They have to take the image on the spot, right? So the instances get reduced of such things happening. Yeah, but what happens is that when the video gets uploaded, it takes a lot of time that requires a lot of internet bandwidth. So that's a challenge. And people when they come to have a trip, they don't want to spend a lot of time filling up the questionnaires and filling up the taking the images. So that's a hindrance for our experience. So we had to think about a good idea. Thank you. So yeah, my question was around the multiple input. I think one was video, same questions basically. Some person asked whether it's the real car or not because so how do you consider piping different inputs? One is video. I think I know SageMaker has video analysis, engine as well, one is photo. And you talked about, we'll make the input process more stringent. But that goes to the same problem that if as a user, I am pained by the things that you ask me to do. And every time you say, oh, there is a tree, take a proper image, oh, it's a dim light. Probably my user experience is going to be so bad that I might reconsider coming to Zoom car at all or not. So one is this side. The other is more on the AWS side. And what is the kind of accuracy? I mean, in your positive use cases, what is the kind of accuracy that you're looking at? And also like what is your, I mean, is the accuracy the most important thing? Or are you able to look at why it took that decision? Why it came up that it is damaged or not? It is very apparent. Or see, there may always be borderline cases where you cannot tell. That's where the model fails. I mean, the obvious cases I think Arpit pointed out. But even an image that given that a border case is where, let's say there is a very light dent or you can't make out. So what is your accuracy? Where does it fail to detect that there is a damage or not? Okay, so your question has two parts. I'll answer the first one and pass it on to Vinay. First parted about customer experience. If we become too stringent, then how do we do that? So how it happens is any of the decisions that we take are always iterative and it goes in phases. For example, someone proposes that we should deploy a model which detects whether the image is of a car and it will keep asking the person to re-upload if it is not a car. But we know as a fact that models are not 100% correct. So there can be two negatives that it is actually a car but the model doesn't detect it. So in such cases, practically what we do is we do not re-attempt more than two times. If we just flag that, that there's something wrong with this, have a manual check done at any time during the trip. So that the customer experience is also not bad and we try to get as much good data as possible. That is the first part and second part Vinay. So the accuracy will depend on mainly two factors, right? One is the quality of the initial data set that we use for training and the other one is how good the model itself is in identifying this one, right? So if you look at the training data set itself is pretty small right now. So the accuracy is not great for very minor details. So that is one and the other one is there's a general image classification algorithm that we are directly using off the block, right? So going forward, the idea is to refine this so that it's more trained towards this particular domain rather than a general classification algorithm. So that's the idea. And if I'm not wrong, you also have a threshold on what's the amount of damage that usually look at beyond which it's a problem, right? You wanna just follow that? So as a company, we have kind of decided that we won't charge customers for small damages because we kind of have insurance also and it doesn't pass on as a good customer experience. You've seen the customers don't come back if we even charge for minor scratches and all. So we have taken that call, we only charge for major dents and damages. In that case, chances of accuracy being high is good. So that solves both customer experience as well as accuracy. So this algorithm which you mentioned, it's been developed by AWS or... So right now it's been developed by AWS team, Amazon. So there are two parts to it. One is the actual algorithm itself, which is based on MXNet, which is open source. The link is there, that image classification one, right? What SageMaker does is provides you a wrapper around this which makes it easy to deploy this model yourself. So the SageMaker part is what we are working on. The actual algorithm itself is open source. You can have a look at it there. So is this detecting whether it's a car or not or detecting whether the image has a damage? So that specific portion of the algorithm is it hosted as an API in AWS, which we can leverage for some other use cases in our company. So I think I get what you're asking. You're basically asking whether, without me having to do any of these steps, can I still get an API sort of experience if I'm a developer where I can directly ask whether it's damaged or not that kind of, yeah. Not as of now. The closest we have is something called as a recognition, which is a general image, what do you call it? A service for images. So if you send it an image, it'll get you back details on what are the main entities in the image, whether they're people, face detection, things like that, but damage is not part of it. So it's not there yet. This is a specific use case. Thanks, guys.