 Cool. Happy to be here. I'm Bharath. So I run this company called Sensara. We are trying to reimagine how television should work, keeping data science at the center of it. So I also have, I'm sure you'll have a lot of questions, they're going to span from ethics to privacy to what is possible, what's not, do people watch TV or not, everything. You'll be around to take a lot of these questions. The focus of this talk is a lot more on the data science part of it. More so the information retrieval part of it. We are trying to unravel television information retrieval here. How many of you watch television? So how many of you say like, I don't care about TV, I stopped watching TV some time back? Okay. And what were the reasons you stopped watching? And advertisements, ads is like, in fact, in our company, we watch a lot of TV, right? We do not watch TV at home, but at work we all watch TV. And guess what we watch? We watch ad breaks. Okay. You'll know why. So I'm going to mostly talk about our experiences with building this product called ad breaks.in. I've done a lot of work in information retrieval, dabbled it for like 15 years in search. Also worked at Google, got a PhD along the way from ISE. But this is like the best piece of work we've done. There's also Abhay here and Elvis who've contributed a lot to this work. So we wanted to kind of document and share our learnings here, whatever we've done. So that's the effort here. You'll hear a lot about ad breaks.in and in terms of terminology, we use a lot of frequent sequence mining. There were architecture decisions that we had to make while we did all this. An interesting attribute about how do you even measure this system? How do you know like how good it is, how bad it is, right? While you see this whole thing, you will start thinking, right? You will start thinking about, oh, what if there were TV ad blockers? Okay. That would be the first question that'll come out. Privacy is obviously a lot. You'll start thinking about the future of TV, right? And in large, what is this notion of TV information retrieval? So I spent a lot of time on search, but the nature of search is so different in television because these are real live streams that are moving. In fact, I could see some parallels with the previous talk because he's also looking at real-time data, right? So TV is a lot more real-time as well. Very quickly, another thing that I'm going to set ground truth is mostly I'm going to talk about linear TV, right? So linear TV is the TV that you always knew. This is the TV that comes through your set-up boxes, the Tata Sky, Airtel, maybe Hathaway, right? So linear TV is like a set of reverse set of flowing. There's the streaming content. Always chosen what is going to come at different points in time. You can just dip and drink from a stream, right? You cannot stop. You can't stop a river for long. Of course, there are DVR systems, but that's not what you're going to focus on. The opposite of this is nonlinear TV, which is the Netflix model. And the way I look at Netflix is it's basically the natural evolution of DVD players instead of having your like the DVD disc or the CD disc at home and running to a library to get them. Running to the internet and downloading it and then watching, right? We are not going to focus on nonlinear TV because nonlinear TV is a very simple, easy problem. Linear TV actually presents a lot more harder problems to solve. And this is where the volumes are throughout the world, even though there's a lot of PR towards nonlinear TV, Netflix kind of fuels a lot of this. Most people are actually glued to linear TV, okay? What's interesting now is this notion of connected linear TV that has come in. So what is connected linear TV? You're thinking of about probably 60, 70 years of work on television, but they've always designed keeping broadcast in mind. There was no way of getting feedback, right? Some broadcaster pushed content to satellite or maybe through terrestrial transmission came down the wire, and then you had conditional access systems that did the encryption and all that, but there was no feedback at all, right? But that has just changed because the box on which you are watching television is getting connected, right? The box on which you're watching television is either becoming a smart box and you're increasingly see this happen in India. In fact, both Videocon and Airtel have just launched their hybrid TV boxes. Or you might be controlling television through a smart remote that's on your phone. You might say, okay, I want to let me decide what to watch on TV. But I'll decide what to watch using my phone, right? And the phone is a connected device, right? We have a product here. What we've done is we have a small infrared blaster that you can kind of keep in your home. And once you do that, your mobile phone along with our app can become a smart remote, you can then go around talking to it and say, Starplus is going to change channels to Starplus. You can say, show them, take me to a channel that has cricket right now. It'll take you to that. So we have a conversational voice interface that we've put into our phone. We already have a lot of users. It's very vague, but I hope you can see some small bubbles that are there, right? So we have data. These are actual users of our app and product all over India. So we have a lot of data coming in from everywhere, right? And what's interesting here is we know what people are watching. So this question, right? Do people watch TV? It's an interesting graph that we've been looking at. In fact, I love this graph, right? Every day, every time somebody says, like, are people watching TV, I want to show this. This is something that's there always, okay? So these are all different days, different days of the week. I just have the last week's data here, different days of the week. What do you notice here, right? Why do they always look the same? So TV watching is like a habit. People just go back home, switch on TV and watch for... So this is like peak time, always peaks at 8 p.m. every day, which in the TV world is called the prime-time television, right? So 8 p.m. every day, right? They kind of peak. And what's interesting is, this is all data from our product that we get. There's always an afternoon bump. People just finish their lunch, right? And then they want to relax again. Could be largely house-wise because afternoon is the... Before lunch, they need to kind of cook and all that. And then after lunch, they have some time to spend, right? Before the tea break kind of starts. So there's always an afternoon bump. What do you notice here? This is interesting. This is, suddenly you have Sunday, the afternoon bump is actually way higher. Sunday afternoon, people tend to see a lot of TV, right? The other thing to notice here is, people put a solid three hours of TV every day. These are all there every day of the week, right? And this is a graph that just repeats forever even with our app. So what we've noticed is, people tend to... It's a habit. It's in front of TV and they continue to watch this. Important for you, the data is coming, right? So you know what people are watching TV because they're watching TV through connected interfaces. Now what can you do with all this data? Suddenly, you were looking at a system that was completely built, keeping broadcast in mind, no feedback. Now data is coming. Are you ready with all the data? Turns out the TV industry is not ready. They have no clue what's happening on TV because all of that is lost. It's a complex ecosystem that has like six, seven different players, different broadcaster, different guy producers, content, different guy producers, metadata. All of this information is lost. And the people who actually care, they do not have access to this data. And if somebody says, yeah, I want it, he needs the full line to change, okay? So we looked at this problem and said, can we use machine learning to kind of do something about this? So our goal is this. Can we know what exactly is on TV? Okay, so I'm not Starkless. I'm not HBO. But can I know what's running on HBO right now? Not at the level of, hey, there's this show that's running at this point in time, but can I know what's on screen? Do I know what ad is running, right? All that, that's our goal. We want to prepare television to such an extent that you actually know what data is available. The state of metadata today on TV is this. It's a picture most of you recognize, right? So it's your TV guide, something that you hate. Some of you people may have stopped watching TV because this is a contest that we used to do. We'll say randomly tell a person, okay, go to Starkless now, okay? Give them a TV and give them a remote and then they need to go down hierarchically from the top and you kind of spend five minutes just to go to a channel. So most people, what they do is they stop trying to look for something they want, but just do up and down arrow key on the television and they spend hours together like that, okay? So this is the guide. What is the guide show? It shows what show is running, maybe a little bit of information about this, but nothing more. Can we get a lot more information? Now the thing is, now we are saying it's a connected television. We want to prepare the world for getting a lot more information. Can we get information about the news topics you just watched, right? Are you interested in the Bihar episode that's happening? Are you interested in Trump? At what point do you switch channels away, right? You know what's coming on TV at this point in time. The songs that you actually wait for, you put on maybe a channel like Before You Music and suddenly a song that you like kind of comes up, you may turn up the volume. For which song do you do that? Do you have the data? The actress you did not miss, okay? So television industry itself is trying to be a lot more data-driven and especially you would notice that all TV serials are becoming long serials, right? They run for several, several months. Now do you know which actor or actress kind of evokes emotion amongst people so that they tend not to switch away from the channel at that point in time? Then you would probably give them a little bit more airtime the next time, give them, make them a more central character in your script, right? So can you bring that up? The ads you did not switch away from, right? So we are getting to a world where we are trying to get this kind of data insight into what exactly is happening on TV. For the rest of the talk, I'm going to stick to what we did with ads. Of course, we have demos of what we are doing with actors and so on. Happy to kind of show you demos of that through the day. In some essence, if some of you have bought Maison Fire TV or been using Prime Video or something like that, you're trying to build something like X-Ray, but for linear TV. So X-Ray is this small product where at any point in time you're watching a movie, you can pause it and it comes up with these small things on the screen and says, okay, these are the people on the screen right now, and you can then jump off and try to get more tidbits about these individuals. This was easy because they were looking at canned content. When somebody makes a DVD, they put scene level information and tag it and send it out to you. So it was easy to build this product, but television is live, right? Some news anchor is coming, he's on the field, a lot of different things are happening. Can you build that level of information for linear TV? So I'll quickly give you a demo of our product ad breaks, and I'm going to venture off to give you a live demo, okay? Because the network has been stable today. So this is a product that you can also go check out and look at. What we've done is we've built a comprehensive repository of all TV acts that are running in India. It's a system that learns for itself. Nobody tells us, hey, these are the ads. These are house promotions. House promotion is an ad for a TV show, right? So nobody tells us these things. Nobody tells us these are sponsors, these are songs. So we've just watched TV for a period of time. We continue to watch it all the time. There's a real time system that keeps learning, and it automatically comes up with, hey, this clip is an ad. This clip is a house promotion. This clip is a title sequence. This clip is a song. It comes up with this automatically, right? So ad breaks is showing you the ads part of this, and some interesting statistics. We've been monitoring Indian television about 250 channels. We are continuously mining them, and we've discovered just over the last week there have been 12,000 different ads, different unique ads that have been running on Indian television. How many times did they run? Look at this number, 640,000 times. So 12,000 ads were actually repeated 640,000 times, okay? This is why some of you stop watching TV, because it's the immense amount of repetition. The reason why that repetition was there was because they didn't have feedback, right? You build better systems. They will go back and say, hey, this guy has seen this ad four times. I'm not gonna show it again, right? You can build better post-life systems there, right? Now, this system is a live system. This is actually, no, if someone can grab a TV somewhere, you can see it's actually changing all the time. LifeOKHD is as this ad of hair and care fruit oils. Vivo soap is coming in SunTV right now, right? Pix HD has something, Bajaj Pulsar. Let me see if I can click through and while it loads, I'll, you can see it's all dynamic. And this is all built up automatically. It's a system that we've built. And for every ad, we know how many channels it's been played on, how's the breakdown of this, distribution of our languages, different creatives and media for this. This is a Pulsar ad that we've discovered in the Telugu language, okay? It might pause, but that's okay. And in different languages. This is a Hindi variant of this, right? So all of them have been put together here to make up this page. We also know who comes before, who comes after and so on, okay? So, and this was all being shown in real time, right? So this is the live interface was showing this in real time. Now, what if I take a TV stream, a full long TV stream and I start putting markers on this and say, okay, this is what happened at this point in time. So I'll give you a visual of that to show you how our X-ray product actually works. So this is a star plus this timeline. And to make it easier, this is a real time system, but I'm going to play back something that we stored yesterday. This is yesterday's star plus at around close to 3.30 or something. I want you guys to be in top 12. This performance bar, bar, bar. So the standard TV, right? You would have seen stuff like this. But what I want you to notice is as the timeline kind of proceeds, we are able to tag and detect what actually came in that stream and mark it out. I'm going to read up because it's a little hard to read here. Bar bar. Jill Selfie Camera, Oppo F3. So we've discovered this for the Oppo on screen. So this clip turns out to be a house promotion. It's an ad for another show called, Is Pyaar Ko Kya, I don't know, okay? So what we got was just this. As the scene changes, we are able to tell, okay, this is Big Bazaar. We are testing Ambipure against Tadka. The house is smelling. It's a strong smell. Ambipure. It's very fresh. It's very sweet. In fact, this is an internal interface. What we're also trying to say is, as the stream was flowing, we are actually trying to match so many different variants because Ambipure will have like 12 different variants for different languages, right? So we are trying to find out what is the best match here. And that's what is showing up here. It says that this guy was speaking in English, right? So it looks like the English one has the best match. Even I can't read from here. So it had a 75% match on that. The Hindi version had a 56% match. So we are actually trying to see which one is the best fit for this. And then taking a call on. So let me move a bit forward to the end of this. Awesome new workout. But the same old hair removal cream. Right? Now let me zoom out and you'll see the scale of this, right? This is the whole day. We've kind of mapped all of TV and put tags on this, including who was on screen and so on and so forth. This is the product that we have. We also have customers for this. But let me go through how exactly this was done. Let's think about linear TV for a while, right? So what goes into linear TV? What is it made up of? Try to start thinking from a content production angle, right? There's obviously the content segment. The content segment is the content, which is actually why you're switching on TV and then TV has put all these things together in the other one for you. Then you have the title sequence. I'm gonna play these clips so that you kind of ingrain into you on how they are. What is it? It's a solution. Hey, dear. TARAK MEHTA KA ULTA CHASHMA TARAK MEHTA KA ULTA CHASHMA TARAK BHAI, TARAK BHAI. BACHE KHAIL TE HAI KRIKET ROZ KHIRTIN YOKET So this is the title sequence. Every day the episode comes. In fact, this is the most-running episode forever in Indian TV, I think, has all the records. And there is a channel that plays this out of 24 hours. It just plays this for 18 hours, right? And this title sequence is probably repeated like so many times ever. This is the title sequence. Nobody told us this is the title sequence. We need to honor that, right? And then this is a break marker. TARAK MEHTA KA ULTA CHASHMA So a lot of people can recognize a break marker even if they're in the next room. They're like, okay, break, katham o gaya, come, let's go back, right? So this is a break marker. They're like sentinels before and after the ad break comes up, right? Nobody told us that this is a break marker. In fact, these are all real clips that I've generated and put over here that our machine is able to generate. This is a house promotion. Ita ji, the machine is really moving. Ye brum hai. What? Acta problema chhe. Kahi, is the goot-boot tori? Gobawad, my Patanjali honey. Your Ajay ke 100% guarantee. All four of them were actually about Tarak Mehta and one of them is trying to advertise tomorrow's show. One of them is a small break marker. One of them is a title sequence and the other one is an app. Ab muskurane ki ek aur bhaja. Ek chathai moisturizing cream yukth daaf. Paain teen daaf ki saath ek daaf bar free. Komal mulaim tucha ke liye. Ab aur bhi siyada daaf. Right? The job is to reverse engineer these things out of television by just watching TV, right? Not, we're not like humans that are watching, our machines are watching and they're trying to get them out. In other words, can you just take the full video stream of television and come back with these markers? Tons of applications for this. If you wanna prepare linear TV to convert it into catch-up TV so that you segment out the content parts and the ads parts and then you can stream this online and then you can choose to put different types of ads over there, right? So it's important to know what TV is made up of because the broadcaster doesn't even have the technology to do all these things himself. We'll think a little bit more about what does it mean to build metadata for ad breaks in particular? So although we are doing a lot more but for the context of this talk, I wanna focus on ad breaks itself. TV is continuous streaming video. Nobody told us what are the ads, right? And every day there are tons of new ads that are coming up. 60 FPS of pictures which has aligned audio and it has some EPG metadata. So you know like this show is supposed to start at this time and end at this time, right? So this is the context. No standards or compliances. There are 800 different channels, new channels coming up all the day. This is just India and worldwide, if you look at this across languages, they tend to use different kind of systems, right? So it's not an IT problem because there's too much diversity there. No watermarks. And if you don't know about watermarks, just don't bother, right? So because they don't help. TV looks like this regular expression, okay? You have a content segment and then you have a break marker optional and then followed by an ad or a house promotion. One or more of them. Then you have a break marker again, optional and then the content segment continues, right? This is a regular expression and this is the ad break and we wanna get all these clips out and start tagging them. So let's think about, are there statistical properties of each of them that we can exploit to build a system that works? One thing to notice here is also in my experience I've been trying to call myself more an information retrieval guy because you care more about the semantics of information and see if you can exploit that in a given domain to make your job easier. So for me, like machine learning is a tool and if machine learning works so good, if you have to use supervised, unsupervised doesn't matter, right? Whatever works. But having some domain knowledge helps you solve a problem faster. So quickly you wanna start understanding the domain and see what works here. Break markers. There are signature sequences of video or audio that are specific to a channel and or a show. There is this interesting property of break markers that the break marker when Tarak Mata is coming is gonna be different from when Ishqbaz is coming from when that 70s show is coming from when say Silicon Valley is coming. The break markers are always different, right? But they always repeat whenever this show is coming up. So if you start analyzing the regions of video that always come around when this show is telecast then you have a better chance of getting break markers. They're very seldom seen elsewhere. So maybe that gives you some clues. There are always sentinels that separate the content from ads. There's the start of an ad break at the end of an ad break. What about ads? Ads are short, very short. It costs money to put them on. So the ads are like either 10 seconders, 15 seconders, 20, right? And so on till 30 seconds. Very few ads are a minute long. Heavy on audio. The thing with advertising is even if you're not watching TV you should be able to hear the brand. That's how the ads are designed. So they're very heavy on audio. They repeat a lot. This is what you're all an idol, right? So they repeat a lot. They occur across channels. They're not loyal to one show. They're not loyal to one channel because you want to maximize impressions. They just occur across channels. They always come together, right? They've been bundled together. So there's a lot of locality of ads. If you think there is one ad somewhere there's a very high likelihood there's one more ad either before it or after it, right? Because so you can use these properties. What about house promotions? They feel very similar to ads. They always come in ad breaks. They're like ads for other shows but they're different. The thing about house promotions is they change every day. Typically an ad stays the same for an entire campaign and maybe about a month or if the advertiser has a lot of money he might be doing something every week. But house promotions change every day because for tomorrow's show you need to create a house promotion today and for day after you will be beaming something tomorrow. So they tend to change a lot and they also feel very similar to ads. Also makes it a little hard to distinguish them. What's interesting is house promotions always feature actors that otherwise come in content segments. Okay, which is not so with ads. There is if you have a mechanism to recognize faces that come up a lot in content segments then you know that they are most likely going to occur in house promotions as well. Content segments, most likely good actors, the lead actors of these content shows tend to occur a lot. That's why they are stars of the show. And the show metadata that you get from the EPG is most likely going to give you their names. And if they're reasonably popular enough they have some web presence which means that they have some photographs available there. So if we have a index of these photos from there maybe we have a chance of trying to identify when these people come up on TV. So again the broad theme is TV has a lot of repetition. So can we generate a dictionary of features? The central theme of how we operate here is we're trying to watch TV over a period of time and because TV has a lot of repetition we try to gather features that are very, very characteristic of television. So we gather those features, build a large repository of this and we start looking for them again in live television. So for example ads, house promotion, break markers are all repeating clips, right? They tend to repeat a lot. People tend to repeat a lot so we can probably have repeating facial features. So the whole idea is can we look for learn features in real time and then use that to tag streams on the fly? So I'll take you through this algorithmic sequence that we do. The whole idea is start looking for clips that tend to repeat often in television. But where do you look for these clips? What is the infrastructure? Are you going to attach a computer to your cable television and do it? You need some place to actually do all this mining. So this is how TV actually comes to your home and it's important to know because you need to set the context for where you're going to do all this mining. So generally what happens is you have a TV channel, let's say HBO, he uplinks his signal to a satellite and then from the satellite there's a downlink and there are these things called multi-systems operators who are your TV operators, the Airtel, all the cable operators, your Tata Sky, DTH operators. They have these dish farms where they start looking for all incoming TV and then they start analyzing it. After this, they encrypt all of this and push it through the wire to your homes. So what we've done is we've done a deal with these guys and we put our data centers in their premises, which means that we get untethered access to all the TV channels across India in one place for us to analyze. So we've put our machines over there and we do all this analysis there. The other thing to notice is video processing is very, very expensive. If you're thinking of 60 frames per second, every frame, if you try to do analysis, you're gonna burn all your cash just in hardware and Abai is in fact talking about wanting GPUs, which means that it's even more expensive. So you need to come up with ways to make this more efficient. So what we do is we say, okay, ultimately we want video but let's start with audio because audio is cheaper and so we strip out the video from the TV and just start with audio. Audio looks like this, right? So it's, right? But again, audio processing is some waveforms and some frequencies and amplitudes and all this. It's still not a convenient mechanism for you to do analysis. So what we do is we try to convert audio into a number sequence. So there's this known method called acoustic fingerprinting. If you've heard of Shazam, which is a sound recognition system, somebody's playing a song and then you open your phone, it kind of recognizes what song is playing. It's similar method. The process is called acoustic fingerprinting where what you do is you take some piece of audio and come up with some number sequence for this. So at this stage, what we've done is we've converted a TV stream into just a sequence of numbers, okay? Now our job is easier. We just need to look for repeating sequences of numbers in this long, long stream, which happens to be a very, very famous problem called frequent episode mining in data mining terminology. And magically all these audio clip candidates keep coming up, okay? They're not the best. You need to do a lot of post-processing, but this is at the core. At the core is we've converted it into a number problem and then we look for repeating sequences of numbers, maximal repeating sequences of numbers and these are all interesting clip candidates that we can start working on. So what we then do is you have an interesting audio candidate. Now let's go back to video. And what we've just done here is we've, we've had a large space of videos to work on. We've just tripped so much of this out because we have interesting audio candidates only. So the domain says kind of drastically reduce. Now we are like, let's go back into video. And what we do is if the audio actually ran from maybe T seconds to T plus 10 seconds, we tend to take T minus four and T plus 10 plus four and take all the frames together. And then we do one more process of trying to do maximal, maximal frame mining now in terms of key frames. So here's an example of that. So you take this, something else could have been before this, even after this, there could be something else. You take that and then look for enough samples of this and see which one tends to repeat often. Okay. And you do this, you end up with a clip. So you can see this example was actually clipped out of Sony sub the channel. It's not some clip we just got from YouTube and we get it right from the first frame to the last frame. The next frame after this would have been something else. This is just beautiful. And we do this at scale every week, 12,000 such clips are actually coming out and bubbling up. Some statistics again, something that I went out. This is the output of the system. It's actually a real time system that's been deployed. People are using it to make money off as well. Why do all this, right? What kind of things can you get out of all this? I'll just read out some insights might be hard to see here. And from the data sets that we have, we're able to come up with these kinds of insights. What are the ads that are watched on TV by people with the Twitter app on their phone? How many of you have the Twitter app on your phone? It looks like everybody, right? You all watch TV, right? Now, can Twitter know what do you watch on TV? You're all loyal followers of Twitter. Does Twitter know what ads you watch on TV? So these are two different islands of information, the one, the web, and the other. We're able to come up with this. This is interesting, right? A lot of Twitter users have seen the BarthiaXa Insurance ad, SkodaRapid, G1Sati.com, Datsun, Ritz, and Go, right? So this is huge monetization opportunity for Twitter itself, where assume that you are with the Twitter app, and then you're in front of TV, you're seeing this lack-may iconic Kajal ad come up over there. Now, and then you start browsing your Twitter because obviously it's an ad break. You'll start seeing the same ad show up there, okay? The next time that happens, you know who's behind all this, okay? So this is called TV synchronized advertising. Huge demand for this, because you have a world of brands spending so much on TV, and then increasing spending on digital media, but the two are not even connected, okay? Another interesting thing, top ad scene on TV by Redmi Note 4 and MeFi, okay? Both Xiaomi phones, and you'd say like just two Xiaomi phones, but what kind of ads have people with Redmi Note 4 have they have seen? They have seen the Shadi.com ad, they have seen GivenSati.com and Titan Skin perfumes. What about people with MeFi phone, okay? They've seen Berger, Antidust Paint, Honda City, Tourism Australia. What does it tell, what this has to stand for some real data, right? What does it tell, it stands for something. This was not by choice, okay? Shows a lot about the preferences, like there is something when somebody goes and says, I'm not gonna buy a Redmi Note 4, I'm gonna buy a MeFi, okay? You know there is something in his mind which kind of psychographically associates with, maybe he's a guy who will yield for a Honda City ad, right? So all these interesting patterns come up. Now what does this data mean for Xiaomi, right? Think about it. Another interesting thing, the Airtel Payments has been doing a lot of advertising. Now what other apps are on the phones of people who have seen the Airtel Payments ad, okay? We've noticed that the top app, this is a maximum likelihood, everybody will have YouTube on their phones because it's all Android. We have noticed that the top app, the maximum likelihood is they all have the UPI app, okay? And they also have a Patreon app. Does it mean something is going right with their targeting? They're able to reach out to people who seem to be their demographic. But if this was not there and they were reaching out to something else, then you know that they better change their strategy, right? Okay, how many minutes do I have? Okay, so I hope you're all convinced about this. So there was a huge need for building the system and not even looking at all the use cases that are made out of this, multiple, multiple different use cases. We're seeing interest from OTT operators, advertisers, like all kinds of people. Now the question is, now this guy has come here and is talking about his built the system. How good is the system? Can you even measure and tell how good the system is? Okay, does it get all the ads? Is there a ground truth? What percentage of ads are you able to uncover from a machine learning perspective? So that's what we wanted to kind of come up with. There is no ground truth because ads are changing every week. In fact, there is one ground truth. There is a almost semi-government agency called BARC. BARC stands for Broadcast Audience Research Council. And what they've done is they've gone to all broadcasters and said, okay, put watermarks and then they've installed people meters in people's homes, big team, and in fact they pay people to watch TV, right? So they're gonna watch TV the whole day and one channel per person. And whenever an ad comes, he puts and makes an entry into Excel and says, okay, this is the ad that's coming up, right? So that which is why the data is very old. This is how I want it to be, good. So data is very old. So you don't have a real-time grasp of how good this is and it's very sparse. So we wanted to come up with a way of telling how good the system is. If you know the terminology of precision recall and the F measure, right? So it's kind of important and it's important for data scientists to have some discipline in how they are doing this work. So for us, precision means was everything we called indeed an ad. You don't want to randomly arbitrary call something as an ad, right? Everything, when you say it's an ad, it better be an ad, right? That is precision. If we say we are 100% precision system, then every time we make a call and say, hey, ad, it should have been an ad. If it was not an ad, then you lost precision, okay? So recall is did you catch all the ads where there are holes in between? Did you miss some of them? What's a false positive? Something that wasn't an ad but tagged as an ad. A false negative is something that was an ad but we missed it. Because this supposed to be an industry ready system, we said, okay, let's focus on 100% precision. This is also something that I kind of learned while working at Google, which is always focus on precision more than recall. Although people say the F score is something, the harmonic mean of the two is something that you should aim for, but industrial systems always say go for precision because humans do not like errors, okay? And we see this a lot. We have our app out there. We have our product out there. One bug and they'll say this is the worst thing that ever happened to humanity, right? People are like, you've seen this, right? So people are very, very unforgiving of errors. So you need to build systems that are very 100% precise. So we said, let's put a human layer, every ad that goes out of our system. And even though our machine says it's an ad, we better get it validated by a human, okay? Now, then there's a question of how efficient this human is gonna be, but we can park that for now. So we built this tagging utility for him and so they can just play the ad and try to figure out if it's indeed one. And there are such substitutions over there and they just quickly market. It takes about one to two minutes to verify one ad. So it's a very scalable thing. We just need like three, four people to get this job then and we are able to be up to speed, okay? Sorry, yeah. So how do we measure recall? One interesting insight that we got was ad breaks are always coming together, right? All ads come together. In fact, the way shows are created, how annoying would it be for the show starts out, then two minutes ad break comes up and then like another two minutes, pick it is like that. It's ideal for showing ads, right? So every over is like three, four minutes and then you have an ad break. But typically ad breaks are all bulk coming in bulk. So it's easy to identify if you missed an ad break in the whole timeline. If you missed an ad, then you start seeing holes, okay? So look at this. You had a long content stretch and suddenly our system is saying, hey, add and it does so for like three, four seconds, then it's a false positive, right? Similarly, you had a long stretch of otherwise an ad break, okay? And then somewhere in between there's a hole, this is a false negative. So statistically we converted this into a formula and we said, okay, our recall is the duration of the full ads that we matched divided by the duration of the full ads that we matched, but also add all the holes that you kind of left out. It's a very good measure of recall, okay? And we use that and these are actual numbers. We continuously track this for every channel for Sony, there's 93.7% recall, 86% recall. We can never achieve 100% recall because we need to learn a new ad, right? And it takes some two, three times for it to come up and repeat and for our whole system to kind of get set. So we can never get it to 100% recall, but this is actually pretty good numbers in that sense. Okay, Bharath, we are out of time. I'm done. There are ways to improve this and there's a lot to be done. I'm done.