 Hi, my name is Vikram Shankupta. I am with iMarry Technology Services. So I'm going to talk about enterprise-grade data labeling. Now, I know that this is not a topic that usually causes the pulse of the AI research community to go beating faster. And some of you who are sort of at the early stage of the AI journey might even be wondering, what is there to talk about in data labeling anyway? You have some data, you have your in-house teams. I'm sorry I picked up a cold. And if that doesn't suffice, you can always crowd source. But there's actually quite a lot to talk about, particularly when you get to enterprise scale. And I hope by the time we finish this talk, most of you would agree. So let me start with the story. Is there anyone here who knows this gentleman? So his name is Bill Bowerman. He used to be one of the most celebrated athletic coaches in the US. And he's also a co-founder of Nike, the sports company. So he, Mr. Bowerman had pioneered most of the early designs of Nike footwear. But the reason that, and amongst them, I think one of the most famous ones, an experiment that he carried out, was called the waffle iron experiment. So what he did was he took a piece of rubber. And he used the waffle iron to press it. And that resulted in a design that was both lightweight and also had a very firm grip. And with a little bit more refinement, this actually resulted in a highly successful line of shoes. It was called the waffle trainers. And that led to explosive growth for the company. Now when I heard this story, I think two things really struck me. One was how this gentleman's obsession with producing high performance athletes resulted in an obsession with every step of the process that was needed to create those high performance athletes, right down to looking at the soles of the shoes that the athlete was wearing. And secondly, however good and fancy a shoe might look at from the top, what really matters at the end of the day is the support the humble sole provides to the rest of the shoe, because that really can make or break our performance. So it's not quite unlike the design of AI systems, because while what really excites the research community is all the work on algorithms and deep nets and the training of models, what is equally important is the performance of is the quality of data that's collected means and the quality of the labels that actually flow in through the pipelines to train those models. We sometimes underestimate the importance and the complexity of these two steps. But then we are also at the threshold of software 2.0, where unlike software 1.0, where traditionally the programmers did not really have to worry much about the data. They could really focus on understanding the logic and implementing the logic well, giving precise instructions to the computer. But all of that is changing with the advent of AI. And as this quote reminds us, a large portion of programmers already today and certainly tomorrow will be collecting, cleaning, manipulating, labeling, visualizing data that fills into neural networks. And so what's happening is that the data is becoming an intrinsic part of the algorithm almost. The lines between the two are becoming very fuzzy, with the outcome depending as much on the data as on the code. So to cut a long story short, the point I'm trying to make is that there is a need to be as mindful about our data strategy for an AI company as it is about your algorithm strategy. So let me go a little bit deep into the data situation. So figure 8, so how much time does it spend in a typical AI lifecycle? How long does it take to annotate the data? So figure 8, which is one of the well-known providers of data labeling solutions, estimates it's as high as 80% that's spent on data preparation and labeling. And the similar statistic comes from Kognitika, which assumes 80% overall and 25% on data labeling alone. Now, if you look at it more concretely, if you look at self-driving cars, which is a very popular and one of the most widely used examples of data labeling in computer vision, this paper reports that for one hour of video, you need 800 hours of annotation. And of course, for a self-driving car, one hour of video will not suffice. You need thousands of hours of video, and you need millions of frames on which you have to do annotations. So speaking from our own example at Imerit, one of our automotive customers, we get an average of a quarter million to half a million frames per month. Each such frame has around 10 objects on an average. And if we do something like full segmentation, it requires around 45 minutes per frame for one person to do it. And then clients often typically ask for multiple judgments, so say between three to five people working on it so that they have a better understanding of the quality. Similarly, if you look at it from the medical domain, this is an example of a medical imaging customer. So 200,000 endoscopic scans, an average of two anomalies per scan, and they get multiple judgments on every data piece. I talked a little bit about semantic segmentation, but that's really one of the many ways in which you have to annotate the data. And what we have been seeing even at Imerit is that the data annotation needs are increasingly becoming complex. So whereas even maybe one and a half years to two years ago, we were pretty much working on bounding boxes, which are simple rectangular boxes that you put around objects and try to put it very tightly. And that kind of annotation you could do in a matter of seconds. And then there are polygons, which are precise boundaries. You have to put in precise boundaries around some of the objects. And this, of course, takes a little bit more time, but maybe you can still finish it in a minute or so. And but then there are key points where you have to not worry so much about the boundaries, but really understand what are the key, what are the important points specific to a use case. And we'll see an example of that. And then you have to highlight those. Then there's segmentation where you essentially have to, all objects need to be precisely marked and clubbed by type. And this can take anywhere between 45 minutes an hour or more. And then you might have videos where there are multiple frames of a video and you have to make sure that when you annotate a frame of a video, you just don't do it in isolation. You also consider the fact that the objects will be moving to the next frame. And you try to have some kind of consistency across multiple frames. So that, of course, takes a lot more time. And then there's 3D LiDAR. So in LiDAR, you don't get conventional images. You get point clouds. So the 3D LiDAR cameras emit those kinds of signals and they get sort of reflect from the objects. And you get dense point clouds. It's very hard to make sense of them if you look at it. And there's sort of 3D models of that. You have to play around with that. Usually the way you make sense of it is by looking at accompanying 2D images. And then you could sort of correlate, OK, here's a car here. So this point cloud might be representing a car and so on. And that obviously takes a long, long while. And if that was not more complex enough, then you have multi-sensor fusion where you actually combine LiDAR images from multiple angles. And then there is the whole thing about domain. And as soon as you bring in domain, it can add several layers of complexity. So if you're looking at something like a travel AI assistant, if that's a use case and you're in the travel domain, a lot of people will have general knowledge about travel. So you might sort of be able to go get by with sort of general skills. But as soon as you get into this domain, like, say, current events or fashions, you need more specific world knowledge. And that needs to be present in the annotation workforce that you're working on with. Then there might be some domains like, for example, e-commerce where it's very jargon-rich. You have to deal with a whole lot of brands, a whole lot of products and categories. So that certainly needs a lot of specialized skill development. And then you get to domains like healthcare, finance, law. These are very sort of heavy duty, complex subject matters. And there you need a lot of really deep domain knowledge. So for example, if you just focus on healthcare, what you see on the writer, examples of use cases that you might actually have to annotate for. So the first one is segmentation. Say you are presented with a scan and there are some certain organs. So you probably need some amount of knowledge to sort of understand which organ stands for what. But typically you just need to be able to navigate well with the tool that you have for annotation and mark those out. Now, if you get into something like identification, so here you're dealing with not just identifying organs, but really making sense of maybe tissues on the organs and really figuring out which tissues might be problematic. So there you need skills like the anatomy and physiology pattern recognition and so on. And then there's classification. So maybe if you were able to identify that this is something wrong with the tissue, but is that lesion, is that a tumor and so on? So that really requires a deeper understanding, working with multiple dependency trees and a lot of deep domain knowledge. And then of course, sort of the holy grail is diagnosis and treatment where through this data annotations and through sort of additional study of sort of the clinical history, the contextual analysis, you can actually arrive at a precise diagnosis and prescribe a treatment. Typically that's can only be done by experts. We will be quite well positioned if you're able to get to the classification stage, which is really where we currently are. So I hope, I mean, whatever I have said so far gives you a sense of the complexity of the work involved and why for some of the more high value use cases, customers are increasingly looking at enterprise annotation. There are actually several considerations involved here. The first is data security. So we are all familiar with this code. Data is the new oil and for AI companies, data certainly represents a big competitive differentiation. It's almost like an intellectual property they must safeguard. So it's very important that the annotation partner that they're working with really understands that and puts in the right kinds of processes to make sure the data is safeguarded. Now that's very hard to do. You can do it when it's in the house. It's very hard to do when you're working with the crowd. But if you're working with an enterprise scale partner that has sort of certifications in, for example, SOC2 and so on, then they can make sure that the right levels, the many layers of security protocols are in place starting from how do you give access to the premises? How do you give access to the production floor? Making sure that cameras and mobile phones are left out or making sure that downloads are disabled and so on. The next, of course, is quality and throughput. And quality means, for example, things like precision and recall and throughput is a productivity. And how we could actually improve those. And one of the things we have really seen clients appreciate is consistency in judgment. So for example, if the client wants, and this is an example I was also giving at the birds of a feather session, if the client wants a person who is walking on the road for a self-driving car to be sort of labeled as a person, but a person riding a bike to be a rider, they would actually appreciate if you go back to the mental that a baby on a stroller is probably also a rider because you would want to treat that separately. Then there is domain knowledge and of course, targeted skilling. So the way you usually improve quality and throughput is through very targeted skill development. And of course, the skill development gets a lot more involved when you're dealing with domain knowledge. For example, in areas like finance and healthcare and so on. And one of the things that clients also appreciate and benefit from when the domain knowledge is actually retained across iterations. So usually the data that needs to be labeled does not come in one large burst. You label it and you're done. It comes across multiple iterations and usually you have to spend a lot of time working with the client on the data, understanding the edge cases, really getting a good contextual understanding of the use case. And then you're done with that batch. You have to wait for the next batch. But when the next batch comes, if a new set of people are working on it, then it sort of eats into your productivity. Again, it starts from scratch. So if you're working with a service provider, enterprise service provider, they would typically have smart workforce allocation techniques in place so that they can retain these learnings across iterations. And finally, it's custom tooling and insights. So a lot of the AI companies may have their own tools. Some of them may expect the data partner to bring in a tool and show some examples of that. But so when you bring in a custom tooling, when you bring in tooling, they would appreciate the ability to customize that tool and then they would appreciate getting insights. So the insights can be on the data itself. So if you're working in a domain where you've already served multiple clients, you have good understanding of the use cases and the things that could go wrong. Our clients really appreciate that and benefit from that. The insights could also be on the tool itself. So what kind of small changes can you make to the interface of the tool to increase the productivity of annotation? So at this point, I'm happy to introduce iMerit. So iMerit is a take and build data services company. We leverage human knowledge and judgment along with technical skills to serve AI clients. I think what sets iMerit apart is that it delivers high quality managed services while affecting positive social and economic change. So we have around between 2,500 and 3,000 employees. So far, we have served around 130 plus clients. We operate out of nine centers. 80% of our workforce comes from various underserved communities. 50% of the workforce is women. And we like to say that we sort of stand on two feet. One foot stands for our social and inclusive mission, building aspirational carriers for youth from marginalized communities and the other really stands for client value of business impact. So this is just to give you a small sample of use cases we have worked on. And this also gives a sense of when you do enterprise data annotation as a business, the kind of experience you can actually gather and that you can bring to bear on a particular project. So I already talked about autonomous vehicles and street scenes. That's a very popular example. And a lot of our work is in that domain. Then we do a lot of work on smart agriculture, which is essentially looking at aerial images of crops and annotating and separating out the healthy crops from the ones that appear diseased. Then transcription of ancient documents. This could be, for example, birth records, death records, marriage records, sensors and so on. And these are typically written in very cursive writing and it's certainly, it needs quite a bit of training for anyone to be able to make sense of those. Then another one that's a personal favorite is for a surveillance videos annotation. So looking at surveillance videos coming from forests and being able to highlight sort of frames which might have a sense of risk due to poaching. So for example, human poachers or for example, vehicles approaching through the forest and so on. Then I briefly talked about medical AI. So identification of tumors and lesions in medical scans. We did a very interesting project with a major museum which was categorizing their set of art objects. And this was very difficult because the art objects were very contextual. It's not, you can just say, no, this is a sculpture and this is a painting and so on. We actually had to understand, read about each of those art objects and get the context in which that was done and then be able to categorize them appropriately. Then period assessment for property insurance. So you could actually look at a building at its maybe front elevation at the roof type and tag them and that actually goes into models that can then estimate the insurance premium for the building. The last example is risk assessment of power assets. So these are power assets that are stationed in dense forests that run through dense forests. And whether we are, so when you annotate them, this goes into models that looks at the risk, sort of the risk from, for example, forest fires. What would happen if a forest fire broke out? So this is one more example. This is an example from the sporting domain. And in this case, the client, that the client's end goal was to build 3D skeletons for analytics. And it was meant for pitchers in baseball games. So what you see we have done is that, this is called key point annotation. In this case, we had around 20 key points, which were, you know, which we had to position at sort of right places, indicating sort of joints on the human body. And as you can see, you know, the body could be, you know, it's not that it's only one kind of image. There could be a lot of contortions and distortions because of the way the game is playing out. So essentially given videos captured live during games. And from these kinds of annotations, 3D models were built. And this was used both for understanding how the performance of players could be improved. And also for understanding what was the risk of injuries. For example, from sort of the study of the bends and movements. And this was started with Chicago Cubs. I think they made a lot of good use of these analytics and they went on to win the World Series. But now it's also been expanded to multiple teams. And we are also looking at, you know, batters and fielders. Okay. So I'll get into a little bit of some of the practices that we have gathered over the years, working in annotation. So I think one of the first things I would like to point, I think this also came up in the words of a feather session with Vijay made that point. That you have to work in close lockstep. I mean, the annotation team and the client has to work very closely together. It's not about throwing data over a wall with some instructions to the annotation team. And then they're coming back with the annotated data. We actually have sort of evolved a collaborative framework and we work very closely with the client across this framework. So it starts with expert consultation. So we have a solution architects. These are people who have expertise in specific domains. So for example, we have a doctor on board. He's a solution architect and he leads our discussions with medical clients. We have PhDs in linguistics, national language processing and so on. So they are essentially, once a sales team, and this is just to give you a sense of how an enterprise annotation company works. Once a sales team brings in a deal, the solution architects would sit down with the client. They would understand what is it that the client wants to do? What's the use case? What's the business value it is after? Then they will look at the data and try to understand what are the annotation needs of the data and how these could be converted into tasks and in a way that sort of keeps the cognitive load within manageable limits because the person who would be annotating would not be an expert. After that, after capturing that knowledge, the solution architect would work very closely with the learning and development team and by then sort of a team of annotators would have been brought together to serve this particular client. And so the subject matter expert and sort of the learning and development team would come up with a training program and everyone who will be working for that client on those use cases would go through that program. There would have to be sort of also past assessments to successfully serve the client. Along with that, what will happen is that the workflow by which you actually deliver results to the client, they have to be customized. So this, for example, will take into account the client's needs with respect to how many judgements do you need on a piece of data, how many people should denotate it, what should be the level of quality checks, what can happen in sequence, what can happen in parallel and so on. Then there's a feedback cycle. So this is typically, this happens in the early stages of a project. We prefer doing a proof of concept but there are no penalties assigned but then you could actually work with the client very closely and form up understanding and expectations on both ends. So typically we have seen that in this stage when you start working on the data, you will see lots of cases which are not covered by the instructions which are given. So for example, there was a, again from an autonomous car example, we had to sort of annotate persons and then there might be a reflection of a person on the window of a car. What do you do with that? There's no instruction given because of to handle that. So as you work on those samples of data, you go back to the client. The client also gains an understanding. They come with their instructions of how they think should be handled and sometimes what happens is you start with the instruction document, it keeps on evolving for a few weeks and then only it stabilizes and there's sort of common understanding on both sides and then you start the production work and then of course when the production work starts, usually that's when you really start keeping track of things like the accuracy and there was some discussions on that in the birds of a feather session as well and things like productivity, things like service level agreements and so on. So the other thing to keep in mind is context is really king here. So understanding the context and why you're doing it really matters. So if you just have to annotate this kind of a vehicle, this kind of a photo, it's very difficult to answer. What should we annotate? Is it the person? Is it the vehicle? It really matters what the use case is. So if your end goal is to actually avoid hitting people on the road, if that's what this is for, it's really a self-driving car, you would probably need to annotate both. You certainly need to annotate the person who's on the side but if you're all that you care about is counting vehicles, it probably does not matter. So understanding the context and understanding all the objects that make up that context is extremely important. Second is the UI matters and this is particularly true for productivity. So the client might say that I want bounding boxes no smaller than 1.5 centimeters and then they give you an image like this. So unless you build these kinds of constraints into the UI tool, it's almost impossible to do a good job of completing this kind of task in any reasonable timeframe. So having a good annotation tool and having the right hooks in order so that you could quickly customize it to the needs of a particular client become extremely important. The next is domain specificity. So as I say, they will have a solution architect who's a domain expert who will sit down with the client domain expert. They would actually unpack the jargon. Both are domain experts but they'll now need to work with people. In our case, people coming from sort of underserved communities without the formal education system. How do you actually unpack the jargon? How do you create deep and narrow training curriculum in terms of videos and video conferences and deliver it through video conferences? And then how do you actually do it in a way that the learning improves over time and you are able to retain that over time to the right kind of workforce allocation techniques? And then allowing open feedback. Now this is extremely important because we should not be looking at it as a sort of a client and service-provider relationship. In traditional software development, we are used to appreciating the requirements. Engineering is a highly collaborative process. You have back and forth with the client because unless you really invest in that space, the rest of the software development is bound to suffer from various kinds of problems. So having an open conversation allowing feedback both ways is extremely critical. So this can be conversation around quality, also understanding what are the kinds of errors that can happen. What errors are acceptable, not acceptable, but at least which errors are less important and which are more important in the context of a particular use case. So for example, for a self-driving car, you might miss out annotating a car which is further down the road, which is small, you have to zoom in and see it. And you might miss out a car which is right in front. The first one is a mistake, the second one is also a mistake, but the first one is a much less expensive mistake. So the client might be willing to live with that. How will, and then sampling quality. This is also important because sometimes you have seen that clients would, because of shortage of time, they only take very small samples. The problem is that there might be misunderstandings that have been created in, we do not shop in that small sample. And over a period of time, we miss out sort of understanding the mistakes that have been made. And when they show up later, there's a lot of rework. Many, many batches have to be reworked because of that. The other thing can also happen, you take a very small sample, it happened to come from a few team members who have recently joined, they made a few mistakes. Based on that, an entire batch might be sent back and that's unnecessary rework. So having this safe space where we can iterate without having to pay a penalty, small discovery calibration points, lots of questions on age cases is a very healthy practice. We benefit a lot from it. We always recommend others to also make use of this. So being able to do all of this at scale is not possible without technology. And we invest quite a bit on technology in-house. And I have actually several of my technology team colleagues here from my merit. I'm really speaking on behalf of them here. So there are broadly three domains in which we invest when it comes to technology. One is the workforce and delivery management. So we have an IMPP which is I Merit People Platform. This keeps track of things like attendance, rostering, time sheet billing, all apparently very simple and boring things. But these are what really keeps the health of the company going. I mean, if mistakes go up in billing, then this can cause a lot of problems for a service provider. Then productivity and quality metrics also. And these are integrated with external systems which we might have purchased from other vendors, including systems for HRMS, for client relationship management, opportunity management, financial management, and so on. And there's a lot of work which we do on building client applications for data labeling. We have an in-house tool for doing that for CV. I'll show you an example of that. It's called Lightning. And we support various kinds of annotation techniques like bounding boxes, polygons, key points, semantic segmentation, et cetera. Similarly, there are tools for natural language processing, named entity recognition, sentiment analysis. And thirdly, from the perspective of IAMirrit, because of the kind of organization we are and we want to be, there's a lot of focus on talent management and learning and development. Whereas a lot of this is quite discretionary in other companies. For us, this is a very core business function. So a lot of technology used for automating assessments, for blended learning programs, for coming up with capability maps, knowledge management, so that we talked about domain management, how to do our domain knowledge, how do you actually codify that? How do you make that easily searchable, easily discoverable for new people who are joining the organization? So this is an example of Lightning. We have some of the core developers sitting in the audience. But what this actually shows is, this is an image and what you see on the right hand side is several kinds of controls which are available for you to actually work on this image. So you might choose, for example, what is the kind of annotation you want to do. You might design a taxonomy that you want to sort of apply on this image. You import that taxonomy, then you can select what kind of objects each of these are. Then you can highlight things like certain object is occluded or truncated, meaning that it's only partially visible or sort of cut off by other objects or by the image. So we have actually taken this and we have gone a step further. We have integrated automatic evaluation on top of this tool. So what you see here, I don't know if it's visible at the back, but there are actually two kinds of rectangles here. One is in red, the other is in green. So what this does is that firstly, subject matter expert can use this tool to create what is commonly known in machine learning as a ground truth or a gold standard, which is it's the ideal annotation for a particular image. And then you could have another annotator and we use it quite heavily in our sort of entrance tests for computer vision, who comes in and also annotates. And then we have an algorithm at the back which sort of compares it to it, tries to map one set of annotations with the ideal annotations. And it comes back with feedback on how many of those annotations were correct. How many of these annotations were almost correct, but maybe there were a few mistakes and a few attributes. Whether there were any redundant annotations made, or whether there were any annotations which are missing, which essentially is about recall. You're sort of missing out on very important pieces of information. So all of this is fully automated and when this report comes, it gives us a good sense of the potential of a new person to do computer vision tasks. Typically we place more emphasis on I4 detail, because not everyone has that. It requires a lot more training. I think accuracy improvement, whereas a bit more easier to fix. We have also started extending this tool so that it becomes more interactive and we are going to use it for cross-skilling people who want to move from one domain to another within the organization. So yeah, so I'll sort of end with this. This is capability maps. Now capability maps is a model of what a person knows. In a service provider, as I said, we have close to 3,000 people working hundreds of clients. When a new project comes, it's very difficult to figure out what is the exact, what's the best set of people to work on this project. For that you need to know what each person knows or does not know what are the strengths and weaknesses. So optimal stuffing is one driving force for this. The second is training needs analysis. As I say, we invest a lot on learning and development. So having a model of what a person knows is very important to know what kind of training that person needs to go through. So we are working on, well, this is not the user interface, but this is sort of what we are working on. Hopefully the first version will be available in a couple of months. But this tells you that for every person in the annotation team, what are the specializations of that person? And then if you look at the capability map, so this is a person who works on finance annotations and the sort of the color codes indicate that this person has a high degree of capability in finance data extraction. So you could actually go down and see why that's the case. In this case, it's because the person has done a lot of tasks in finance projects, has demonstrated high throughput, high accuracy. We are also pulling in the data that might have been done by the learning and development team, assessment data. So one input for performance comes from delivery, one other set of inputs comes from learning and development performance. And then similarly, there are non-functional attributes that are very important as well, particularly things like attention to detail, tenacity, annotation is, it's not for the faint-hearted, right? It requires a lot of concentration for long periods of time. Communication, because that's extremely important, particularly with the client. So there are programs that we have, we are building as part of L&D, which is also focused on things like improving communication skills and assessments are carried out on that also. So overall, this gives you a sense of what are the strengths and weaknesses and based on which learning recommendations can be made. So for example, in this case, the person might be recommended to go for advanced natural language processing because the person already has sort of some level of expertise in that or advanced courses on writing better emails because there's a sort of a need to improve the clarity and so on. So this is going to be I think a very, so far when we had even a few hundred annotators, it was possible for some of the very knowledgeable managers to decide who gets to work on a project, but simply that's not scaling. So this is going to be feed directly into our rostering tool and anyone who's going to start a new project will have access to the capability map. In fact, recommendations are going to go into the rostering tool in order to recommend the right people. Okay, so I think, let me just summarize with sort of the key points. I hope this talk gave you a sort of a sense of how enterprise-grade data annotation, managed services organization might work and for those of you who are looking over on the AI journey, you know the key message is really please treat your data strategy and I think the similar point was made by Surya Juna and other speakers, data understanding data, understanding the purpose that you're after, you have to reflect a lot on that before you jump onto this bandwagon. So asking the right questions about the data, asking the sort of planning your time and budget, looking at what skills you need and whether you have that in-house, whether your use cases are such that you can manage with a crowdsource team or whether you really need to partner with an enterprise vendor and then creating an environment where you actually engage very closely with that organization, with that partner. You create an environment where you go over the data it needs together. You discover insights, you discover edge cases together and that really is setting the foundation for building a long-term secure and scalable pipeline. So thank you. If you have any questions, I'm happy to take. Also, I apologize once again for my health issues. Hope it wasn't too much of a trouble. Hi Vikram. Thanks for the nice talk. In the first slide, you mentioned about figure eight, right? So you guys also doing anything related to speech recognition transcripts? Speech recognition. Yeah. Speech recognition, I think we did a little bit but probably not too much. Do we do a lot of speech recognition or not? Not too much, I would say. So mostly, and I'm worried it works mostly on computer vision and content. Content means both text, the natural language processing, named entity recognition, sentiment analysis as well as any other form of content or anything on the internet, e-commerce, use cases or content moderation and so on. Speech a little bit I've heard but I wouldn't say that's a very sort of a very significant area so far but clearly if the demand builds up I will certainly be interested on that. Really enjoyed the talk and I had a question. So when you give this task to people, do you use models to sort of like assess them before they annotate because that might help with productivity but it might also create bias. So do you do that at all? Assessing the people before they're past on the product. Yeah, so like the bounding boxes, you put all, there are some not so great models that you could use to, you know, give a first, to create a first version that they kind of refine in their assessment, right? When they are giving annotations. So do you do that? That was one. So when we, before we place people on a project they go through a training. Part of that involves a pre-training assessment and a post-training assessment. So there they would be given tasks to do and typically as I said we have an automated assessment system by which. Sorry, I went to ask, on a task, do you use a model to actually, you know, give them a rough version of what they should be annotating before they actually do it? So you're looking at more of like a productivity gain of it. Yes, more of it. Okay, so far I, well, from a learning and development perspective we can certainly do that. You know, this is just to see if they are able to, so for example, the tool that we talked about, you could usually extend that to see that if they are able to fix certain things you have deliberately left out and so on. One of the things we have been working on is so I talked about when you have to annotate across frames and often that becomes very repetitive because you imagine you have annotated a frame of a video. In the next frame there might not have been too many changes because you're probably sampling rate is very high. So one of the things we have been doing is we have, we are finding ways in which you could sort of predict what the next position of say our car would be in the next frame and let the person make only small changes to that so that instead of having to annotate that whole frame again, you make only some small changes by sort of inheriting in a sense the model from your earlier frame but also making certain changes to it. It's not that you just take it and impose it because some of the positions would have changed. So you try to figure out what the new positions would be and then you let the people make small changes to it. Thereby you can save a lot of time. The tools that you developed for data annotation, do you think there's a new business opportunity where many companies might need and find those tools useful that you have developed that you know? Yeah. So we would be very happy if that's the case. I think to some extent we have clients who certainly make use of our tools but they're also, so since our clients are mostly AI companies they also have their own tools and often they prefer us using the tools. We from, you know, our strategy is that we are tool agnostic, tools agnostic. We are happy to go with whatever tool best meets the needs. We will certainly recommend our own tools if we believe strongly that, you know, that's the best for a particular, for a given task. In terms of business, it is a business opportunity but this stage, this whole field is in a, I would say, still maturing. So I think as sort of, a few use cases like for example, autonomous vehicles and a few others are maturing and reaching a level where you could consider doing that. But at this point we are focusing more on technology as, so for example, even the platform that we have developed for management of the workforce for doing data annotation, that's certainly a product. Some of the tools that we are starting to develop like for panoptic segmentation and so on, those have a lot of more innovation built into it where it's easier to see a market for that. Some of the lower end sort of low hanging fruits like, you know, bounding boxes and so on. I would say those are sort of well known right now in the industry, there are many solutions which provide that. So the key, as in the case for any product is to find out both the need and also an unmet need. And I think in some of these areas we are starting to see that. In those areas we certainly want to develop products.