 My name is Ekta, I work as a data scientist with ADNIA. I'll talk about what ADNIA does and some of my work. But before that, you know, because some of this is really a theme of my presentation, so I would like to start with my background. I have a background in computer science, mathematics, and economics. Don't ask me how I managed to get them all. But the key thing that I really think about when I'm looking at statistics is, you know, there are these things that we repeatedly do. So how can we kind of productize some of our work and really give a shape to the data products that we build? So with that, I'd like to introduce the structure for this talk. I'll start talking about an introduction to a real-time bidder, which has done a really good job about introducing what really happens in a real-time bidding. So I'll try to see, to cut it down and hopefully setting the context for the rest of the talk. Then we'll talk about the system design to deliver performance at scale. Performance is something that we call as business objective, so it could be, how do I increase my click-through rates or the CTRs, and how do I, at the same time, bring down my costs? You know, a question that many of you might be wondering, why does the data scientist even care about the system design, which basically goes back to my background? So three very compelling reasons for that. One, for anybody who has interfaced with data, we'll quickly realize that the amount of time that you really spend in your data pref cycles, which is pretty long, and there's very little reusability, if any. So one of the key things that when I was working on this, was how can we actually shorten that? And two, in a conventional setup, when you're building a model, you know, there's a really big time lag between when you build a model versus when it actually goes into production. So we wanted to see, because we're talking about a real-time bidding system, how can we build on a recency of the system as such? And three really is, what form or suppose systems can we build around the core experimentation, so that we don't have to really do reporting, simulation, A-V testing kind of task all around the experimentation? Post talking about system design, we'll talk about two specific data products, and hopefully Gleanover, the third one, in case time permits. And more importantly, I'll focus on what were the choices that we had and why did we make the choices that we did? And fourth, I'll close with, put things together and close with building a low latency, self-learning assisted healing system. So this is all the data that we get our hands on. I think in terms of the number of apps, this figure is pretty outdated. We see much more than this, but the only number that really excites me in this is the 450 plus million ad impressions that we see. Not all of them we serve though, but that's the data points that we're talking about. So this is the context for a real-time bidding system. At the center of it lies an ad exchange or an ad network, which brings the buyers and the sellers together. Now this is similar to what really happens in a stock exchange. So there are sellers and there are buyers and they're trying to sell something except that the commodity that's being sold is a right to display advertising. So when a SSP or a supply side platform which is on the sell side wants to sell an inventory on behalf of a mobile developer or a publisher, they route this request via the ad exchange to the demand side platform which is entrusted in buying this inventory on behalf of the ad agencies or the media desk that it gets to execute these campaigns for. The DSPs in turn may be connected to data partners or third-party integrations that help them to evaluate the quality of this bid. And aside of all this real-time bidding, there's a premium inventory and bulk inventory arrangements that also happen. When a mobile user logs into an app, right? It's as good as what happens in a real-time display advertising. There's a slot that's up for sale. And that is the slot that we say is inventory in this context. So when that slot is up for sale, the publisher signals to the SSP or the supply side platform to sell this slot or request for a bid for this slot. You know, Ambush didn't touch about it, but basically the reason why this is an auction is because there's one good, meaning there's just one slot and there are multiple potential buyers, you know, the DSP1, the DSP2, and the DSPM which is why this is an auction setting. First, receiving this request for bid from the ad exchange, the DSPs that evaluate the quality of the bid, you know, together with the MPs, send a bid response which basically has the actual creative or the advertisement that should be served on the mobile and the price. And there's a second prize auction, meaning that the DSP that bids the highest wins the bid but pays only the second highest price. So before I get into the system design, again the same thing that I'm trying to bring out, more because I want to be walking out with a thought to be able to persuade you about how can we do things in a more richer manner, richer meaning, how can we spend most of our time doing the actual work with respect to statistical modeling. So bidder with that near is the single most decisioning engine, so to say, which sees or has the access to all this mobile inventory. So our goal was how can we build a whole lot of decision systems around it to support all the stuff that I mentioned before. So obviously we need to make the data that the bidder sees flow into some part of our systems and eventually get to the experimentation. So we consume this data via multiple Kafka consumers which then flows to Spark and we do in-memory processing of these logs in Spark. A couple of things before I show you the rest. So there were two pain points that I had when I was working on top of, when I was doing my data prep on Hadoop. One is that for any kind of custom function that you want to probably write, you need to write a UDF, right? And if not that, then you need to write these transformations that basically transform your data in the ways that you wouldn't like to. So which was why Spark helped us in doing a way with some of these pain points. Internally, the way we use Spark is to use cases. One, we marry the data from disparate sources and I'll talk about what are these disparate sources and keep this as one state. And two, we build custom jobs on top of Spark with PySpark, Python plus Spark to be able to do or productize these data prep cycles that I'm talking about. Now, what are these marrying of the data that I'm talking about? So when this inventory really comes in, think of it as the 450 plus million records that I'm talking about. Obviously, not all of these records you are interested in bidding for. Even if it was, so think of it like what happens when you're browsing on a mobile app, right? There are potential, multiple slots that are open but for the same user, you probably don't want to engage with him, even if you had an endless inventory of the ads. So of the ads that you see, it's like a funnel, right? The ones that you see, it reduces to the ones that you're interested in, of the ones that you're interested, part of them you win because there are other players who won this inventory and the ones that you win, assuming that's the number of ads that you actually show, part of them convert and then there's a post click behavior that goes on. So when we really log these events, so to say, inside of our systems, there are three basic collections in which we do that. One, at the topmost part of the funnel, where we basically log all the records that we are seeing, which is the richest form of data that we see. Let's call this as request. Then there are the times when we are actually winning that impression. Let's say we log this as win logs and then we log the events when a click happened. Let's call it as ad logs. So when from a pure modeling for a class perspective, meaning whether or not a click has happened versus not, I would want to marry this. And building some of these custom jobs on top of Spark actually helped us do that in a much more better manner. Post Spark, obviously the data needs to flow in into our data products and experimentation, which is something that I'll cover in the third part of this talk. So you have live feeds that flow in here. Post you have some reasonable confidence about the experimentation that you're doing. You would want to estimate what is the business risk that you're putting up yourself for. So Ambuj rightly mentioned, they do this estimated expected revenue and stuff like that. We do something similar like, you know, what is the amount of dollars that we hope to spend till the time we learn that this is performing and this is not performing, you know, that point in time. So we just to say that we should know when to stop and experiment and when to keep going on. Post this, obviously we have, suppose this assuming, you know, your decision is something like y is equal to beta one x one plus beta two x two plus beta three x three. So how would you want your bidder or the decision engine to consume that? Do you want to make that logic sit inside the bidder? Or would you want to find a way to decouple that from the rest of your system so as to have the flexibility to update it multiple times over? So which is why we use Redis. Another note, which is just a key value store, but another note why we use Redis is because one of the first things that we as company did was to have user profile. So assuming we have upward of 250 million user profiles, how would you want a bidder in which wants to work in a millisecond latency of three to six millisecond, how would you want bidder to consume that? Which is where we had some sort of stickiness for consuming Redis for the same use case. So then we enable the pipe so that bidder gets all it needs, meaning the beta one, beta two, beta three, so on. But the logic as such resides in the bidder, again, keeping the flexibility to change it purely from a statistical side. Pause this because you want a good measurement system. So we build a AB testing framework where basically for each of the inventory that is flowing in, we select the experiment because we run multiple experiments at the same time and then we assign the target group that this inventory should fall into. So the experiments are live. Now you want reporting. So we have a feedback loop that basically helps us learn how our modules are going. And again, putting that to the same point, because of the feedback loop that we have about 30 seconds to a minute, we are able to update these snapshots in Redis in a matter of seconds, if you would. In terms of what we just see, I want to quickly touch upon what are some of the responsibilities of each of these key pieces. So at the center of it, you obviously need a data pipe that will feed data one to the experiments, which basically owns some of the statistics and two, for bidder to actually being able to consume what the experiment has just learned. And apart from that, it does logging, maintaining state and notifying the system because there are broken Kafka pipes and so on. So this updates the snapshots to experiments which basically does three things for every inventory that needs the pre-targeting checks. It calculates the probability to convert. It decides whether or not it wants to bid and at what price. And as inputs, it consumes the feature engineering by feature engineering. It basically means if I give you the epoch time, which is just a number, what can you make out of it? And our goal right from the start was to productize some of this. So I mentioned about writing UDFs or user-defined functions earlier. This is exactly what we wanted to get done in Spark. On metadata extraction, I'll show you when we get to one of the user's web log. So I'll leave it for that. Again, after this, I want to simulate the dollar spend and the minimum sample size for which this, I think we are running at 90% confidence. So it gives us an acceptable risk and timeframe post which you signal to the experiments that look, I'm ready. You test it on the sandbox and it goes live on Bidder. Bidder obviously needs the decisioning logic. And as inputs, it does all these pre-targeting checks. So I mentioned we are into location-based, mobile advertising. So some of these checks could be if an advertiser wants to say run a campaign on Australia, right? Or let's say inside the Melbourne or inside a very hyper-local geography, then the logic for those pre-targeting checks really reside there. A.V. testing, of course, because reporting has to follow and it needs to read the stage from Redis. Apart from this, I mentioned that we do simulation to estimate the business risk and stuff, but we also do, you know, but we saw a use case where, where let's say we defined the split as 40, 30 and 30, making up for 100%, but in that 30, one of these 30, the model was rejecting a lot more inventory. For anyone who belongs to display advertising, digital display advertising, we'll agree that burning the inventory is really your opportunity cost. So we wanted to be in a situation where if a model rejects that inventory, there is a concept of universal sync, meaning it goes back to, let's say the status quo or the control group. That's what the override target group rollback does, and then there's logging and reporting. The A.V. testing basically splits the traffic across all the experiments, and inside the experiment, it splits the traffic in the target groups, and post that, you go live, there's reporting. In the reporting currently, this was also something that we built from scratch in-house. We had like a pacing rate, which is, you know, what is the rate of inflow of my inventory? What is the rate at which I'm bidding for? What is the rate when I bid for I'm bidding? Because this gives us very good indication of what are some of these demand and supply signals, meaning if an inventory, I mean, it would be very intuitive for you to assume that if the win rate is very low, but the rate of inflow of inventory, which is what is the pacing rate, is high, this signals that the app probably, you know, that app or probably that single piece of inventory is in high demand. So maybe it's not a very good idea to go for it at the price that we are talking about, or maybe it's a good idea to revise your prices. And in standard metrics, we also measure the percentage list because it makes us look good. And we have a low latency feedback loop, as I mentioned before. Another one of the things that I personally learned when we built this was, you know, whenever we are thinking of building something from a statistical side, we fail to ignore some of these, we basically ignore some of these input pipes for us to consume that really exist. So this was one of our learnings. So we had a product that we were exposing to our advertisers to manage their campaigns. And what we found that, you know, when we tweak some of the data that we were aggregating in these APIs, if you would, we were able to build reporting with just very bare minimum statistics and feeding on to the same pipes. So there was no new data prep, no new writing APIs right from scratch. This is basically about all the choices that we had and what were the choices that we made. So I said earlier that performance in this context means, you know, how are you going to meet your business objectives, right? So in terms of the inventory, maybe we think about the inventory from getting a right app, app in this context meaning a real estate. So if we know that apps are created unequal, there are some apps that give us high engagement compared to the others. So can we build an app ranking system from all these attributes that I can extract from an app and then tune our strategy for bidding, you know, when seen from an app perspective? Or if we can look at it from a right timing perspective. So I mentioned that there are these demand and supply signals. So if we look at the demand and supply signals at a macro level, right? So there is this rate of inflow of inventory there is rate of, you know, there's a rate at which you win your inventory, which is your win rate. And of course, there are performance signals like, you know, your click through, your cost per click, so on and so forth. We know for a fact that, you know, in case of stock markets, for example, you should always go anti-market. So staggered inventory bidding was actually, you know, one thought, I mean, it's not even a product type right now. It was one thought for us to get there. So what it basically does is that it learns from what happened in the Delta T previous snapshot, productizes on that and then pushes what it has learned for the bidder to execute that much number of inventory. So you can think of it like allocating your budget or deciding how it wants to bid along these multiple dimensions. When seen, you know, as a matter of Delta T time. In app engagement, this was a prototype that we did on four apps, four largest apps actually. What we wanted to see is that one, does it make sense to actually engage with the user when he's just beginning to wean off within that app because, you know, nobody goes to the user to check ads, right? And also that, you know, there are these key moments in the, so Ambush also talked about the page loads, right? So there's something similar in the app when a user is transitioning from one page, so to say to other, and there is stickiness in certain pages, can we cap on some of these opportunities and really, you know, show the ads there because then, you know, by the fact that the user is going to stick there longer, the engagement will be a lot more. Or we look at it from a right user perspective and we say, look, we do behavioral profiling, meaning we identify what are the students, you know, professionals, homemakers, travelers, and then we activate the segments as in when an ad campaign comes. For the power users, this was also something that went live. What we did was, we basically break down the users into three segments, you know, people who engage a lot with us, something we call as power users, not interested people who don't engage and neglected people whom we have not engaged, you know, we have not shown ads to. On the mobility and affinity analysis, we are basically looking at the places that people are checking in, what are the apps that they are engaging with, and seeing if we can come up with some sort of relevance to assign a relevance basket for each of these users so that we can match the ads that we serve to them better. Sorry. Or we look at it from the right price perspective, this was one of the first things that we did, where we basically consume all possible signals and just try to get the pricing piece right. In this, again, we do two things, we identify what is the propensity of the user to engage, and based on the propensity, we somehow try to find what is the price that we should pay for this. Or we look at it from a right creative perspective, meaning we show the user the same creative, but with a windowing concept, meaning we don't expose him to the same creative, you know, till this window is met, or we look at it that look this guy has engaged in the past with toothpaste, cosmetics, so maybe he's a consumer good kind of profile. So we basically show him creatives for products that belong to consumer goods. Or we try to say that the place where you're checking the ads on has a lot more impact on your behavior to engage, which is where we come at, geo-relevance. So the first thing I wanna talk in this is the metadata that we skipped there, right? So you see that there's a user agent here. The only interesting thing with this user agent, which is where metadata really comes in is the mobile, GT19205, you know, a human knows that this is a Samsung. Probably it is a big device. It sounds like that, but how does a system know and that too, at real time, when it has to make that bidding decision? So when we started modeling, and we know that, you know, when you're modeling, you need to have at least 5% population mass. We saw that, you know, this was just one, I mean, the whole user agent really differed by a minute, but it would not fall into one of these discrete classes. So what do we do? We basically take this user agent, we find a way where we basically encode it to make a query to an external API from which we built a data store in-house that had a collection of about 35 to 40,000 mobile devices. And then reverse engineer from the user agent, the actual term where the mobile identify really occurs. What that helped us do was now I have a name for that user agent. I have a master class, which is Samsung. I have the attribute whether or not it's a touchscreen. I know that it's not a tablet and I know the height and the width of this phone. One very intuitive reason why height and width is really important is because, you know, think of when you're browsing on a mobile, would it make sense that, you know, if most of your area on the phone is covered by an ad, are you more likely to engage? Yes. So that's what we actually saw, which is why we wanted to build or specifically extract the width and the height from it. There are some exchanges that also show you, you know, Samsung, Sony, these master classes, but my experience has been that in most of the cases, this data, you know, except when it was like iPhone and stuff like that, the data was cut up. So in the signal, we see user, which is like hashed values of your device ID, your IDFI, your Android ID, so on and so forth. You have the app name, you have the app category, you have the epoch time from which you can find the day of the week, the time of the day and stuff like that. You have the creative attributes, meaning it's a 50 cross 320, you know, creative that's being served. So this is like we do modeling on the historical data, right? Then we have a win price that was generated for that request for impression. And then we have country, latitude, longitude, and yes. So the dynamic pricing model, the dynamic pricing model basically consumes these signals expecting that your probability to click depends on your engagement within a certain app, within that app category. So the app category as we saw was IAB1, 24, entertainment social networking. This is one of the very beautiful things about the Internet Advertising Bureau because it really maintains this list. One of the bad experiences that we've had is, let's say there's a very high classical soccer app, right? So when we look at this category, it is also listed as a news app and also listed as a sports app. So that in some sense introduces a little bias in your modeling, but that was what we kept for when we first developed this model. Then we look at the length of the session the length meaning in dimensions of time and the depth meaning. What is the total number of requests for impressions that we see for that user inventory for till the time that he's in the system? And obviously we had to introduce throttling themselves because we just assumed that user is not going to stay there forever, but some users were so. Then we have these engagements from creatives. Just a quick note, we didn't use all of them, but this is more like a template if you would, like some part of the creative attributes, right? So there was what is the background of the creative, whether or not there was a picture, whether or not it was localized, meaning if we are serving as in Indonesia or whether or not it had an Indonesian text. But we did use the dimensions of the text and stuff like that. Then you could potentially have an engagement with a vertical meaning consumer goods and stuff like that. So it is intuitive to believe that maybe financial goods and financial products and consumer goods have high probability to click compared to, I don't know, buying cars because people don't buy cars all the day, all the time. So then you have user profiles where you're basically seeing students, affluence and stuff like that. And you have the handset attributes that I just talked about and obviously time of the day, day of the week and whether there are some seasonal trends that you should correct for. Post this, getting the price was, I mean the basic assumption that we had here was that even if I get a probability of one and if I just multiply it by my status quo price, all I'm going to get is the status quo price back, right? So I should do something different. And also because I believe that if I break down the probability into 33, 33, 33, 33 baskets meaning zero to 0.33, 0.33 to 0.66 and 0.66 to one, they should probably impact the price differently because the probability should impact differently. This idea actually comes from this paper that I have referenced there. I think this is media six degrees if I'm correct. And what this is basically trying to do is to estimate a stepwise function where you find some constants where constants are nothing but your base price, multiply them with the probability of click and hence find the price. And other than that, we also had some capping conditions which we basically learned from our operations to cap when we didn't have a lot of trust on the models itself. This was modeled as a logistic regression with L1 regularization where you basically trying to say that, look, I want to penalize the larger weights more. And one of the best things that I learned when I was doing this was, rather than training on the whole data set, you probably want to bag. And the other thing was, we typically ignore what we have learned in the previous pass, meaning we just say, okay, we do 10 cross, 10 full cross validation. But the other thing that I learned in this was why not use something that we have learned in the previous pass? So there's something, this was done in Python as start parameters. So if you memorize what you've done in the previous and then start it as the basic state, you will converge faster, you will have a better fit and along multiple dimensions, like area under curve and stuff like that. Also, some of the things that we do it was, so let's say I learned this 1,000 times over, for each of the 1,000, I was just storing the parameters along with the start parameters. I was storing the goodness of fit like AIC criteria and stuff. So the final parameters, you could just multiply them with these weights and then find the final averaging for the final parameters that it should give. In terms of the variable importance, I know it's pretty hard to read. We saw the phone by far stood out in terms of its impact and the creative height and width. Just to translate the creative height and width, the easy way to see it is, what fraction of my phone dimensions does a creative height and width take up? And then we had other signals where interaction effects of time with respect to country because we were all logging it in one time zone and we wanted to kind of bring the country, the time zone effect into place. We have app classes and some of the other signals that I talked about at the session depth. For problem number two, I talked about the demand and supply signals, right? So let's say you have $10,000 and you want to execute a campaign on behalf of a lot of advertises, right? So one very naive thing would be you start the campaign and then you just wait for it to end prematurely or you have a fluctuating budget, you paste it uniformly and stuff like that. After close, I'll run through this slide. This was one of the very interesting works we did and this is where the beauty of Spark really kicked in. So we were trying to build a mobile app ranking system. So the conventional approach we did was, let's create the number of downloads, the app category in Apple Chases, but there was still a lot of noise. So we didn't see a trend for CDR. But then the very important signal that we saw was, the win rate for the apps, which is basically a signal of the quality really depend on these other factors. So the guiding principle that we really had was, if an app is really good, meaning it delivers, it should be in high demand and hence it should show up with low win rate in your life feed. So based on that, we first tried to see whether CDR really depends on these signals. Once we see that's a good fit, we try to model a Delta, which is the price spread, meaning over and above what is the status quo price, what do I bid for? And then reverse engineer that price back. So I learned Delta T plus one based on these other signals for the last time period again in Spark and then predict what my bid price should be. Surprisingly, this was one of the very simplest model that we built in the least amount of time and this just worked. A couple of notes on this. So the clicks that I just said, we are aggregating on let's say last hour window or 15 minute window, they could come anytime. So you could have a click now, you can have a click 10 minutes down or you can leave a recording clicks in as much as 12 hours. So there's this outcome class that is yet to come. And also because the low win rate was more because other people believe that the app was high performing. So there was a bias there. And also because there are these very strong time of the day effects, meaning your win rate will be very low when the day just starts. So we thought that is it not that we are kind of penalizing a Rockstar app and overweighing a not so good app by just relying on the last time period. Other than that, there were other issues like, there was an issue where part of our Casca pipe got broken and we ended up with cut up snapshots all over. And we didn't have any logging mechanism back then. So we say, let's look at previous six snapshots and look at all of them to decay with the 80% fraction. Mostly because you want them, you want different time snapshots to decay at a different rate. That was good. Then we did a stage three where we say, let's model it like a cross section time series where we not only look at the CTRs, but we also look at these other indications of win rates and all the other parameters that I mentioned before. The effect this had was obviously the CTR when I learn it from this way versus the actual CTR that could have been predicted by the previous one, assuming everything else equal, was worse. But it actually helped us push in for system failures, broken pipes, and the outlier cases, like the Rockstar app not performing and the really bad app performing, overperforming in a certain time. So we are integrated with four exchanges. For one of these exchanges, I think it was about the data that we had was about three months old, and we didn't have a lot of these historical bidding points. So for this, I'm going to just brush through this. We did a Levinshine distance mapping, meaning now that we have learned these apps are high performing, we could just translate it to this ad exchange, and I'm not going to go into the last of all of them. And this is the last slide about what we really learned. Be as close to the data pipes as possible, as it's humanly possible, meaning own the statistics to the last mile, create all these support systems around what is the core data experimentation platform, and always try to cover the baseline and maintain this universal sync so that we ensure that the business objectives are not heard. Log everything, some of the things that we did was within the Python scripts itself, we had enabled SMTP protocol, and there was an email that was shooting up every time there was an outlier case in the real feeds. And the common theme that I wanted to point out, reuse your data prep, and lose couple each of the system failing fast and forward, and your data is more than just your data. So that's all I want to talk about now. Okay, so we have about another five to seven minutes for Q and A, and anybody who's asking a question right now, we have some special gifts to give away from Nexus Venture Partners, Flipkart and Bloom Reach as well. So if you guys have any questions, please keep them flowing. Could you talk a bit about how you went about collecting ground truth, and what volume of data you counted, did you take a day's worth for an app or across our labs for a week? Are you talking about the in-app engagement or? Right. Okay. An app did you collect for a month or across our labs did you collect for a week? Okay. So I have, I was anticipating that question, so I have a graph for that, because that's a very good question. So when we look at the difference between the clicks and the impressions in seconds, this is what happens. So this is the data for an app. So on this side, there's a lot clicks because if I made it clicks, it would just shoot all over. So this is what really happened. So what I was trying to do in that exercise was look at, you know, look at the, if I chop off the time at 30, what is the number of clicks that I would capture as a PDF or something, like a probability density function or something. But more importantly, within an app, if I look at some of these peaks, you know, where an absolute inventory is being requested, that can probably give me an indication of, you know, how are the users really browsing in. For this specific set of exercise, because I'm looking at only one app, so to say, I wanted to get a seven day worth of data seven because you want to have, you know, time of the day, day of the week, all sorts of effects. And then you basically sample from them. So this is a data for 2,114, which is very small, 2,114 users. And if you do two raised to part 12 or something like that, you will get the total number of requests that you see per user there. So when you multiply 2,114 into two raised to part 12, that is the number of requests. So I would just say we just did sampling to get that. But the main point that I want to highlight here is that when you look at this 11 and 32, you're seeing that, you know, more users are coming between these two data points. So 11 is basically the time here and 32 is the other local peak. So when we look at the difference between this, this is the time for us to engage with them. So, and I also know, you know, 32 minus 11, and when I put this number back, for the number of impressions that happen in 32 minus 11, whatever that number is, I'll get this much fraction of the clicks as a fraction of the total. So that was what I was trying to do. Okay, hi. Right here. Okay, I can't. Right here, right here. So for logistic regression, you said that you initialize it with the last point, right? For the last solution, but doesn't logistic regression actually find a local minimum? So if you keep using the last point, you might get stuck in the local minimum, right? I mean, you can say, I mean, what I learned because I was running it on my local box, which has a GB RAM, what I learned was that if I had, let's say, you know, anything upward of 1.38 million data points, and I know that that was 1.38 million data points, it was very slow. So that was one criteria because of which I really used, you know, the parameters from the old. The other was I would say, you know, more or less the wisdom from the crowds because at some points, I was running into the matrices that are not invertible because if you look at all the data, you know, at the same time, you have to do like a QR decomposition and stuff to get the matrices back because if you're taking a sufficiently long, you know, like a huge data set, there are these data points that are very highly correlated. Even if you adjust for that, that error value, you would still end up getting that. So that was what I learned and which is why I used the, I mean, actually there's a very big thread on Stack Overflow on that, you know, just to understand that. I mean, that's because I posted it there. No, sure. So from a computational point of view, you'll gain a lot, but if you are using the same point again and again, then you'll converge more rate and so it will be, you'll be stuck in the local minimum. So you'll never get out of it, right? I think that should be true. Okay, we have time for probably one more question. We got one more. Do you do any app similarity kind of concept? Let's say you were seeing any app who was doing very good. I mean, in terms of there was a lot of competition and your bid was not winning that implies that the app was a good performer, right? So do you do any app similarity where you can bid higher on a similar app so that I mean, the chances there of you winning are much higher? You didn't do any app similarity as such, but the only signals that we really consume from the app was more along the lines of what really happens in the marketplace, the win rate, the bid flow because we were proxying those data points to give us what we wanted, like the how much over or under bid should we do from the status quo? Because earlier, there was no, there was not really a sophisticated differentiation with respect to what we should bid at a price. So it was more like these other signals, which could be let's say on iOS, iPhone, because I mean, that's like, I would say common knowledge that people on iOS somehow convert more in terms of the click behavior. But short answer, we didn't do any similarity, yeah. Okay, we got time for one more, one more question only. We don't do any spam right now. I mean, it's more like heuristic, that is what we really realized was there were some of these apps, I don't wanna get into the names of that, but we saw that these apps were really caching. There's a guy from our company who really identified that. Maybe he'll be the right person to answer that. But what we saw was that some of these apps were really caching, these impressions are never even serving them. How you identify that, basically you see the tracking pixel hits and in more cases than not, that the ad didn't show up. So that is something that can point you in that direction. Other than that, we did some windowing concept where on each of the impressions, on each of the impression that has happened, it is humanly impossible for somebody to click at the same time, at almost the same time when the impression has happened. Okay, you can assume that it was by a mistake, but in more cases than not, it will not happen. So what we did was, I would say it was more like a memorization of some of these cases that we did. And then we introduced like a time lag, a minimum time lag that has to happen between, when you show an impression versus a click has to happen and any time an app violates that, you basically go log it and then you learn and kind of don't show or serve ads on that. Yeah, but there are other guys who can answer that better. Okay, perfect, thank you very much. Everybody give a warm round of applause. Thank you.