 delivery of ads and I am going to talk about two specific issues in guaranteed delivery of ads and they are about allocation and forecasting which we do in this paradigm of ad delivery. So the talk, am I audible? I think I will restart. Am I audible now, better? Hi everyone, could you please settle down, there is extra seating in hall number 3 and this talk is live stream over there. You just need to come in for Q&A. Thank you. Sorry, live stream is not up yet. So yeah, settle in please, thank you. Okay, I will start once again. I am Aditya from Flipkart and good morning everybody. I am going to talk about guaranteed delivery of advertisements and specifically two problems here, allocation and forecasting. So the talk is generally structured like this. I am going to talk, introduce what is guaranteed delivery as opposed to what and then the two problems I mentioned allocation and forecasting. So before I even get into guaranteed delivery and talk about it in detail, what we need to know is what generally happens in online advertising. So when we open a mobile phone or when we look at a web page, what actually happens is an ad slot gets created every time an advertisement has to be shown and this ad slot, sorry, am I audible? Okay, so this ad slot is then sent to a server and where advertisers bid to get this ad slot and the winner, their advertisement gets shown. This is what happens normally in performance based advertising. But guaranteed delivery turns the problem around a little bit and says okay, can we add some more value here? One of the first things that we do is that give guarantees. What kind of guarantees? We can tell a brand, let's say Volkswagen is launching a car and we can tell them that, hey, I can give you a million views for the next one week, exactly million, at least a million rather. These kind of guarantees make their planning job much better and now they can even say that, hey, I want these guarantees on a niche segment of users, like they can say I want these guarantee, I am launching a book in Chennai, about Chennai. So can I target book lovers in Chennai and then show them this many ads? Can I get 10,000 ads from book lovers in Chennai? These kind of advertisers can demand these things and we can provide these guarantees. So that is essentially what guaranteed delivery does. What is in it for Flipkart? It is essentially that when you deliver such guarantees, you can charge a premium for delivering guarantees. But to ensure that, to ensure that we honor these guarantees, what we actually do is we shake hands with the advertiser on a penalty. If we do not deliver a guarantee, we do not honor on a guarantee, we are going to have an under delivery penalty. So on the downside, so this is fine. But the bigger problem about guaranteed delivery is not the penalty. It is the fact that guaranteed delivery can never completely be a system of its own. In a sense, we cannot assume that we can exactly know how many ad slots are going to come up in the next two months, three months into the future in every single targeting segment. So what happens is guaranteed delivery sits on top of the existing system and we just, so it just adds to the complexity. So the existing system has to be there where advertisers bid for the ad slots. But then we sit on top and then say, okay, we can also provide guarantees, but we'll still do it because it makes sense. So let me get into the various components of guaranteed delivery. So what enable guaranteed delivery? First thing is audience segmentation. So let's look at this. We have Flipkart, all the Flipkart users here and then we can segment them on various ways. I am representing this as a tree, but it need not necessarily be a tree. So let's say I can divide them based on the gender and then I can partition them based on their location. I can partition them based on which page they are on, what their interests are and so many other things where exactly on the page, are they at the top of the page or are they at the bottom of the page? Are they on the app top of the app, bottom of the app? So there are so many ways by which we can divide our audience, the people who are on Flipkart properties to show them advertisements. So this is called audience targeting. Essentially this is how advertisers give us information about what they want, who they want to reach. They would say something like men in Chennai interested in books, let's say. So that's just an example. So now what we have is once, imagine we can apply all such filters and then what we can do is we can get this bunch of circles here which are disjoint audience segments. By disjoint audience segments, what I mean is that whenever a user sees an ad slot anywhere on the Flipkart properties, he would exactly fall into one of these circles. That ad slot would exactly fall into one of these circles. So each circle contains a bunch of ad slots and each circle is called an audience segment. I am slightly abusing terminology here because just because an ad slot actually opens up, doesn't mean that it will be filled with an advertisement. So what we are interested is actually users viewing those advertisements. But let's assume that our efficiency is 100%. We can show the advertisements. So essentially user views and ad slots are one and the same. So the next thing is once we know what our disjoint ad slots are, what we can then do is we can predict. We can, given a day in the future, we should be able to predict how many ad slots are going to open up in each of these audience segments. So if you look at it, so these are some random numbers which are there. So essentially I would say that 2000 ad slots would open up in that particular audience segment, around 15,000 in the next and 120,000 in the third. So enabling this, enabling how many of ad slots are going to open up in each of these ad segments is called supply forecasting. So essentially we do have a supply of ad slots which are opening up and we need to forecast them well into the future. The next component of, so once we have this forecast, once we know what we are going to show, how many ads are going to open up. What we have is advertisers coming to us and saying, hey, I want to show this advertisement to these, these, these, these various slots. So the slots can be something like men in Chennai. If I take that, then what happens is there are lots of these segments which are about men in Chennai, not all of them, but lots of them. So what the advertiser would want to show is advertisements to men in Chennai. So then we will pick up all those audience segments and then connect them to the advertisement which is on the right. So the squarish boxes which you see on the right are, are actually the advertisements, sorry, advertisements. So this problem of allowing advertisers to ask for some amount of demand, essentially they can demand saying I want this many advertisements to be shown for this segment is called booking. Why is it a complex problem? Because an advert, we cannot give guarantees on, on an arbitrary number of advertisement views, right. We cannot show, we can only, we are restricted by the size of our audience. So booking is a system which has a conversation with the advertiser automatically and it tells the advertiser that, hey, based on your targeting, this is the best I can guarantee. I cannot guarantee more than this. So that system which figures out what is the best that you can do is called booking, right. Once we have this bipartite graph of forecasts on one side, the supply nodes on one side, audience segments and on the right, we have advertisements or the demand nodes, right. Then what we need to do is we need to distribute saying, hey, this topmost at audience segment can be actually, we can show two different advertisements there, the first advertisement and the last one, right. So how, how do I show, which advertisement do I show to if a user comes in the first segment, that is called allocation, right. Allocation is the problem of showing, figuring out which advertisement to show with what probability given a user comes, given a slot opens up in one of the audience segments, right. So I will go slightly deeper into allocation and then I will talk about how we model allocation as a constraint optimization and the solution, right. So as I said earlier, so allocation is nothing but advertising, assigning advertisements to ad slots, right. So I am browsing the app and there is an ad slot which opens up, some advertisement has to come in, right. But this is not done at a user level, it is rather done at an audience segment level. So essentially what we do is instead of doing it at a user view or an ad slot, we actually assign proportions to audience segment saying if somebody from this audience segment comes, then assign X amount, X percentage to them, right. So for example if somebody from the first audience segment comes, then we would say, okay, 50% of the time show advertisement one and 40% of the time show the last advertisement and the rest we could not book, so we would go with a regular bidding and figuring out who is the right advertiser to show, right. So this is how the system works, right. So the moment, like I said, so the picking is random, so we do not necessarily decide which advertisement to show at what slot based on any criteria. Once we know the probabilities, we just pick at random and then show the advertisement, right. This is fine, but there are couple of objectives here. One is an explicit obvious objective which is we have signed up on a penalty if we do not show an advertisement. So this penalty has to, we have to minimize this penalty. We do not want to pay a penalty to the advertiser for not honoring the guarantees. So that is a short term objective here and in the long term we also need to make sure that the audience segments are representative of the targeting. Representative here is a loaded word, but okay. What I mean by representativeness is that let's say there is a restaurant which opens in MG Road and they would want to tell people in Bangalore that hey, we are a restaurant in MG Road, why don't you come and visit us, right. So they come to us and say, I want to show 10,000 advertisements about restaurant, and about our restaurant, great. So what we do is that we see that there are going to be 10,000 men from electronic city who are interested in let's say books which is going to open up on that particular day and then say hey, let me allocate all of that to those people and not to anybody else in Bangalore. I'm honoring my guarantee. They just ask for people in Bangalore. Men in Bangalore are people in Bangalore. Men in electronic city are people in Bangalore. Men in electronic city interested in books are also people in Bangalore. So I have honored the guarantee, but the advertiser feels cheated, right, because the advertiser had an implicit assumption that it will go to all the people in Bangalore and since the 10,000 would get distributed across his targeting. But it doesn't have, so just minimizing the penalty doesn't ensure that the advertisements are, the audiences are representative of the targeting. So that's where the advertiser, in the long term, if you want a good relationship with the advertiser, we also want to make sure that the audience is representative of the targeting the advertiser is mentioning. So now let's pose this as a constrained optimization problem. So there is a minimization. The minimization has two terms. I'll go through these terms one by one. So essentially what we have first is the forecasted supply. This is represented as an S and it is indexed by I in this entire formulation, right? So there are various supply nodes or our audience segments and we call them SIs, right? So these are the various SIs and the demand is represented as DJs. So we have J advertisements to be shown and each advertisement, sorry, we have a bunch of advertisements that are indexed by J. So these are the DJs and then we have a representative allocation. By representative allocation, this is exactly what I mean. In simpler terms, let's say there are two groups of users. There are 10 people in group A and there are 20, sorry, 90 people in group B and I want to show 10 advertisements. Then I would show one advertisement to group A and nine to group B. So that is the representative allocation. So that is the theta ij which we are talking about. And Xijs are our actual allocation. What the algorithm comes up with saying, hey, this is the actual, this is what I can do best, right? What we want to make sure is Xijs to be as close to theta ijs, right? So we want to be as representative as possible. And if we are fully representative, then this entire term becomes zero and we have our minimization works well, right? So that's the objective here. And in the second part, this is our long-term objective and then we have our short-term objective. The short-term objective is pretty straightforward. We have an under-delivery which we have signed up with every advertisement. So every advertisement has an under-delivery. And sorry, under-delivery is not something we have signed up for. Under-delivery is essentially what we could not show. So even though we gave a guarantee of let's say 100,000, we could only show 90,000. So we couldn't show 10,000. Those 10,000 for that advertisement are the under-delivery. And what we have actually signed up for is the penalty. We agreed on a penalty PJ saying, hey, I'm going to pay this penalty per ad that I am unable to show. So essentially the product of the penalty and the under-delivery, that is our short-term objective. We want to minimize this. This is exactly what this is the amount that we have to pay out. And we don't want that to be high. So this is the minimization. And it is subject to a bunch of constraints. What are the constraints? So the first constraint is that let's move this UJ to this side. So essentially if this is what the advertiser has asked for and this is what we couldn't deliver. So the difference between them should be what we have actually delivered should be equal to or greater than that. When would this be greater than that? Equal to is straightforward in the sense what we deliver and what we couldn't deliver together should add up to what the advertiser has asked for. That's okay. The reason why there is a greater than sign here is that whenever we over-deliver, over-delivery there is no penalty. Advertiser is happy. We are also okay. It doesn't hurt as much. So if we over-deliver, then UJ becomes zero and SI, XIJ can be greater than DJ. So that's the reason why there is an inequality there. And the XIJs can be thought of as probabilities. For example, if you take the very first node, then we can think of these XIJs as something like this node with 0.5 probability show this advertisement with 0.4 show this advertisement for a person in this node. So these cannot add up to more than one. So that's this constraint. It's a commonsensical constraint and it's there. And our XIJs for the very same reason cannot be negative. And similarly our UJs cannot be negative. So if we over-deliver, we still call the under-delivery as zero and we use a greater than sign. So again these are non-negativity constraints. Pretty straightforward optimization problem. Yes, please go ahead. Yes, that's correct. Should I repeat the question? So good question. All I have to do is make sure that I minimize this term. It's not a... IJ is not a constraint. Theta IJ is just a part of my minimization. In the constraint, all I need to make sure is... See, the Sij Xij term, that's the number of impressions that I have delivered, number of ads that I have delivered, should either be greater than this demand or that plus under-delivery should be equal to this what derivatives are asked for. That's all the constraint is. The constraint is in a much simpler form. The difference is taken care of by the minimization itself. It's not necessarily... We don't need to write that down as a constraint. It's not a constraint. I agree. It's not a constraint. All I'm saying is that even if you over-deliver, you are not actually targeting the targeted audience. You may be showing the advertisement to people who are not really the ones where your advertiser is targeted. I agree. The point is that when we... So this entire optimization, it will find the most optimal solution to this setup. So if we are doing that, then we are going to lose the advertiser. If we are going to show it in a particular way, but we cannot do any better than that for this equation because this is the most optimal solution. If there are any further questions, can we take that after the talk? Sure. This is the short-term objective. This is my long-term objective. Formulation is pretty straightforward. What we have is... If you notice, other than Xijs and Ujs, we know everything else here beforehand. We have... Let's assume that we have forecasted and let's assume the booking system has given us the demand and we know the representative allocation. We know the penalty. So the only terms which we do not know are Xijs and Ujs. So what we do is we just append them into a variable vector and this is more like how we would solve it in Python or something like that. So essentially, we need to rewrite this entire minimization problem in this form, separate the quadratic and linear terms and all our constraints are represented in this form, in a matrix form. And just use CVXOPT or something like that to solve this. Solution is pretty straightforward. CVXOPT generally takes these PQGH matrices and then the solution object contains Z and from Z we can get the Xij vector and Uj vector. So the solution is pretty straightforward. The posing is the only thing. So that's about the allocation part. The other part is forecasting. So forecasting what I want to talk about is a brief look at our data. Please. Good question. So this is a simplified version of the constraint optimization. So what we can do is we can add a fairness objective here. So instead of just Sij, we can say how much importance should we give to representativeness for a given campaign? So let's say we can add an Fj term here, which let's call it the fairness term. S i, X i. So just next to this S i, let's add an Fj term. This Fj term is essentially how fair do you want to be to various kinds of advertisers? So the way we compute it is essentially, this is a large advertiser. We don't want to lose this advertiser. We might want to be more fair. That's one scenario. You might want to compute it in a different way. But essentially, we just have to modify this. And that will also become unknown. So it's just adding, there are no addition of variables. Fairness is just an addition of one more term into the thing. It doesn't change our variables or the overall optimization. That's correct. So it's a good question. It's a problem of guaranteed delivery per se. So when we are going to give guarantees, what are we giving up? We are giving up on matching the advertisements to the users. We are not completely giving that up. What we are actually doing is we are giving the control to the advertiser. So the advertiser goes ahead and says, I want to target exactly these people. So they are giving a very niche targeting. So the importance of matching users to advertisements goes on to the targeting rather than CTR models or something like that, which would give you the right match for a user to an advertisement. So the question was, how do we go about doing user segmentation? So for any given user, we know a bunch of attributes. So these are the bunch of attributes. If you apply whatever targeting which the advertiser needs and put don't cares in the rest of them, what you would end up with is a bunch of circles here. So what the advertiser needs. So and then we go ahead and then say, forecast for those circles and the rest of the story. So essentially any targeting will reduce to a subset of the circles, which are the audience segments. So forecasting. So firstly, we have these audience segments, these circles. Now what we have is the number of user views which we have gotten at a particular circle, at a particular audience segment over time. So let's take daily aggregates and let's say this is 54 days worth of data and we see that the views are fluctuating. Sometimes some days we have a lot of views and some days we don't and in that particular audience segment. Now this is the basic time series of the data and the next thing that we have is that we also know whether all audience segments have a category information in them. For instance we know in what category is this audience segment. So we also know whether there were any sale events, whether there were any discounts, what other things which might potentially affect the traffic in that particular audience segment and that should also get captured and the way we capture that is we have a time series and we also have these blips which are essentially telling us that hey, this sale event happened here of this magnitude, a sale event happened here of this magnitude. We can add any number of variables. This is just to show how the data is. So and what we need to do is we would have these orange bars in the future because we know what we have planned for in the future and we know these bars in the future but we do not know how this curve is going to be or how the views are going to change over the future. So that is exactly what we need to predict given we know these orange bars in the past as well as the future. There are several ways of doing it. I am going to talk about one. One of them is our least squares regression and then there are smoothing models which separate these are not very great for modeling sale events, the smoothing models and there is something called ARIMA. It is an old technique but very useful and very useful in our case. So I will go into further detail. So Flipkart has millions, tens of millions, hundreds of millions of views in a given day. So once we have all those ads lot views happening, we have this entire hierarchy or grouping. So essentially advertisers might want something like running examples like women in Bangalore, people in Chennai looking at books and things like that and whatever the advertiser wants, we would need forecast there. We would need to be able to say hey, this is the number of ad slots which are going to appear. So one simple thing is if all the leaf level segments we have are disjoint, then knowing what would be, I just saying men in people in Chennai looking at books or women in Bangalore, just looking at women in Bangalore. It is just nothing but figuring out what are all the leaf level nodes which have these two in their paths and then we just add them, add their forecast up. So that is straightforward in bottom up but the problem with bottom up approaches is that if we have any nodes where there is not enough traffic, the traffic is so low that we cannot give reliable forecast, then we are going to have a problem. So and then there is another approach where what we do is we just start top down. We say okay, let's forecast here and then see how it gets partitioned into gender and then into location and everything. But then top down is also slightly messy because it assumes that there is a hierarchy. Even though I have drawn it as a tree, it is not a hierarchy. I can flip location first and then put gender later and it still works. So this is more about grouping. So what we do is we use bottom up with some constraints. So what we do is there can be millions of these nodes because of the number of ways by which we target, we can quickly see that it is the number of circles that we can end up with is millions. So one of the things that we do is that we look at all those segments which have very little traffic and don't give guarantees on those segments. It doesn't make sense to give guarantees on the segments when there is very little traffic. And then we take only those segments where there is enough traffic so that we can give guarantees. So that makes sure that our bottom up approaches work and we can just aggregate it back up. This is specifically for people who have logged in and people who have not logged in, there are ways to do it. If there are cookies in the machine, if the cookies are also cleared, I don't know, but for mobile phones it's straightforward because there is add ID and you can just pick up the add ID or the iOS ID and then start using it. So what we want to do is we want to build these time models which can forecast, but if I have to use an ARIMA model, I will get into details of what an ARIMA model is. But if I have to do that, I need to, something went wrong. Yeah, just one simple question. So does this mean that people in different locations could get different ads overall? That's correct. That's what you are showing me. Say what if I'm a Bangalorean and I'm visiting Chennai for some other reason? So does that adding algorithm change again? So it depends when I say location, right? There are several locations. One is your BS location. One is your current location. So based on what the advertiser is targeting, most advertisers target on base location. Whether even if you are in the U.S. and you are browsing a Flipkart app if your base location is Bangalore, you will still be shown the advertisement. Thank you. So one of the things that we need to ensure for ARIMA models to work is that we have to make sure that they are stationary. Why? Because generally when we build ML models, we are assuming that our features are independent. But the way ARIMA models work is that they take into account every data point is dependent. So we learn what is today's forecast based on yesterday's and the day before yesterday's and they are clearly not independent. So if they are not independent, on which we build ML models fail. So we need to ensure that we can build models. We can do a regression. The AR in ARIMA stands for auto regression. That means regressing onto your own data. So I will just keep talking while this thing comes up. So to do that, to ensure that it is stationary what do we have to do? Two things. To ensure that they are independent we cannot ensure that they are independent. So what we do is we ensure something which is a weaker constraint called stationarity. We ensure that the time series is stationary and then lot of these properties of independence hold on stationary data as well. So the stationarity becomes important and to make sure that our time series is stationary. So what do I mean by stationary? The mean of the time series over time. We measure the mean of the window on the time series and then measure the mean value. The mean value should not change. So the mean should stay constant wherever I measure the mean. So that is called stationarity in mean and then we also have the variance. The variance at any segment of the time series should also remain constant and this is called stationarity in variance. This is achieved through logarithmic transforms and there are some transforms but our data doesn't have variance sorry, our data is stationary in variance. So what we have a bigger problem is with stationarity in mean. So how does stationarity, lack of stationarity in mean look like? If a time series keeps going upwards. So if I measure the mean, if the time series is like this going upwards and if I measure the mean here it's going to be some number but if I measure the mean at the very end it's going to be some other number. So the mean is clearly changing and it's not stationary. So one of the ways of dealing with that's okay. So one quick way to deal with stationarity is that we can just take the time series and difference it. By differencing the mean is instead of looking at the time series itself what we would rather look at is the change over days. So from yesterday to today it changed I got 10 percent more views or 10 more views. From day before yesterday to yesterday there were 10 more views. Sure. I did my laptop switch off. Thank you. So this is my time series. It's all nice to think of a straight line going upwards like this and then saying it's not stationary but in a real time series it's not easy to say whether it's stationary or not. So we need to be able to say that hey whether the easiest time series is stationary can I go ahead and start building my models. So for that one simple if we know that it is stationary then there is a simple trick. All we have to do is change in time. So if y t is today's value and y t minus 1 was yesterday's value we rather use instead of using these as the time series we would rather use change over time as our time series. So we would rather forecast the change over time and then we can always recompute if we know the change over time we can always integrate and then get the actual value. So we sometimes if the time series goes up like this exponentially then we might have to do double difference double we have to apply it again second order differencing and so if I apply single difference so if y dash of t of this particular time series looks like this so for me this looks more like it's stationary in my mean in the sense the mean is stable whereas here there is a slight upward movement here but is it easy to is this the right way to do it so we have slightly better tools one of them is called the auto correlation function so what we do is we take the time series we take a time the same time series with a lag right with a lag of 1 with a lag of 2 with a lag of 3 with a lag of 4 and so on and so forth and then we see how correlated are the time series across lags right so this is the original series and this is auto correlation the correlation of the time series with itself across lags and we see that this doesn't fall fast it falls very gently to 0 this is a indication that the time series is non stationary this is a rule of thumb indication so to check this if you take the different series you see that the auto correlation falls very fast below 0 so this is this is a clear sign that hey we have reached stationarity but then we have as I said there are millions of circles even if we eliminate the smaller ones we still have hundreds of thousands of circles keep looking at these plots and then say hey this series needs to be different this series doesn't need to be different and they change every day so we cannot do this so in practice what we do is there are unit root test what this unit root test do is that the test first the whether there is a stationarity or not and one of the test is called augmented decay filler test this is implemented in most packages we just use that and a negative value here indicates that our series is stationary right actually augmented decay filler test actually test for non stationarity so from there we can figure out whether it is stationary or not so yeah so there is also differencing which is seasonal in nature so if you have seasonal data let's say every 12 months you you have more than a year's worth of data and then what you do is you difference this October's value from last October's value right so that's called seasonal differencing like how we did normal differencing we also can do seasonal differencing and that removes a seasonality out so this forecasting with Arima there are Arima is actually a combination of several models the I stands for the differencing that we did it's called integrated so it's just that we have to do the opposite once we have done with once we are done with differencing so that's I and the AR and MAR actually two different models the first one the ARP model essentially says that the value of a time series at time T depends on the values of the time series at the last P instances right so we awaited addition of them plus an unforecastable error term so this is simple auto regression model essentially this looks like our regression but just that these are these are values which are directly from which are the past values of this YT right so in a different series it will be y dash of T equal to this where this is y dash of T minus 1 y dash of T minus 2 and so on and so forth the ma model is slightly more complex it's called the moving average model what we essentially do is we assume that there is a model and then we we assume that there is a model and then compute the errors of the model and once we have the errors computed then we compute the YT and then based on the errors we get new models, new Thetas like a regular regression but once we get the new Thetas again we get a new set of errors and it's a complex model to build and a non-linear model but we do have algorithms and implementations in place which can solve these ma models so essentially an ARIMA model is nothing but a combination of these two so these constant terms are this C and then we have these ARIMA terms sorry AR terms, MA terms and E terms now what happen to the sale events right so sale events are pretty straight forward all we have to do is let's say we are looking at the past 4 days or something like that with P right so what we do is we then take the last 4 days what was the sale what were the sale events if there was no sale event we just set it to 0 and we add 4 more variables to this and then solve this right sale events, discounts everything it's just adding more terms like this to this particular regression and then figuring out YT so this is what ARIMA with sale events does but this is good but how do we know sorry please go ahead there is a for the sale events as you mentioned if it's a regular series it's okay but when you are applying it on a different series right how do you take I mean how do you take the sale events into account say the sale event is probably it is today so when you are differencing the values like today minus yesterday so but yesterday there was no sale event right so how do you take that into account so you don't have to difference the sale events just use the sale events directly right so take it on one side so let's say one simple way it's take it always on the if it is difference take it on the right side or YT rather than YT minus one something like that you assume one and you go ahead with that right it's just a signal which just says okay how much this value is dependent on the sale events but the sale events would affect the stationarity right sale events would no they don't affect the actual time series right they only affect they only add to they only add an additional component to that so okay can we take this let's discuss this okay quickly to conclude two more slides right so so this is all fine but we need to know the right values of P, Q and D, D we already know we can sorry we can use ADF test to figure out the D value but how do we figure out the P and Q values and that we generally what we do is we start with P0, Q0 this is if on a non-different series this is just assuming that we cannot predict it's just purely random and on a different series this is what we have this is called a random walk model essentially YT is dependent on YT minus one plus some error so this is a random walk model and we then that's P0, Q0 once we have and then we go ahead and compute a metric what metric I'll introduce a metric in the slide one of the things which we use is called mean absolute percentage error and I'll talk about that and so we we choose some metric and then we compute the metric here and then we pick its neighbors and go ahead see what is the winner and then pick its neighbors and so on and so forth till we found the optimal point right so this is what happens in this is what we do in practice to figure out the right values of P and Q and lots of these metrics not made but there are some metrics which also take into account the size of P and Q because you don't want to build model with lots of variables you want to reduce that and and that's so there are things like aka information criterion and BIC, Bayesian information criterion which can which do take a size of P and Q into into account so how does the output look like so let's say I stop 7 days before 54 that is on the 47th day this is where I stop and I learn the model there and this is how the forecast look like right using the optimal PD and Q values which I found for this particular time series for 3, 1 and Q, 1 so this is what the forecast look like so in reality and the mean absolute percentage error essentially what we do is we take the forecast and we take the actual value and then see what percent is it off from the actual value and then we compute a mean of those right so that's the metric which is called MAP and in reality what we do is we cannot take the absolute forecast rather what we do is because we have to give guarantees we take the forecast in a certain confidence interval and since confidence interval says that ok 80% of the time where does this forecast lie right so then it generally it should be lying here but that's the expected value but 80% of the time it lies within a window so what I am looking at is the lower bounds of the window the upper bounds are not useful and guaranteed delivery so the lower bounds of the window is what I am taking it looking at and providing guarantees on this is my last slide and ok this is a very simple python implementation of an ARIMA model with time series sale events and future sale events and this is how you do it just for reference so to conclude so we need guaranteed delivery for better quality of service better revenue and for that we need to make sure that these two components work well and there are other components which I am not talking about but allocation has to deliver ads by honoring explicit and implicit guarantees and forecasting we have to do that to set the base for GD in the sense of allocation and every other algorithm depends on this forecast right we have to forecast 2 to 3 months into the future that's about it questions please or do I have any more time for questions 4 more minutes ok so in the segmentation part so like if there is like CT there is the cardinality let's say 500 for the segments the cardinality let's say 100 and something else as the cardinality 100 so pretty soon it will the overall segments will be like 500 into 100 in turn pretty soon it will explode very fast so like how do you control that number of segments like you are using any specific strategy to control the number of segments by yeah there is an announcement please quick announcement guys so there is feedback forms please make sure you collect them from the volunteers if the hall is overflowing there is a balcony apparently which is auditorium 1 sorry yeah so there is a balcony at auditorium 1 and the rest will announce at the next talk thank you that's a good question so what actually happens is that we will have millions not even millions even more yeah it does but the audience in each of those segments drops to 0 so 99.9% of our segments don't have any audience views forecastable number so our number of views per day would be 105 something like that which would be so low that we would want we don't want to give guarantees there that constraint optimization part exactly we drop them off and how do we drop let's say we take only 99% of our traffic that will be only in less than 1% of our segments right so that's a good strategy 99% of our traffic or 99.5% of our traffic is a good strategy to make sure that these segments are very few one question the segments itself are not constant over a period of time so for forecasting do you pre compute what is the like numbers for each segment do you have a semi aggregated then you no we compute it for each day for each item so then the segment itself yes if the segments change whenever the change comes whenever there's a new store opens or there's a new category opens in Flipkart then we'll start forecasting for that as well back computing we will not have data initially it will not be in guaranteed delivery but slowly it will enter guaranteed delivery once we have enough data most ARIMA implementations allow us to give additional variables on which we would generally regress so they are called exogenous variables and what we can do is instead of just giving the original time series we can also give exogenous variables so they will be added as a part of the equation so that's ARIMAX exactly that's what it's called so whenever you are on a mobile app so the advertisement is not there beforehand just after you open the app and you are scrolling down what happens is a slot gets created like 10 to 20 milliseconds and ad comes up there so it's something which we don't perceive but what actually in reality happens is an ad slot opens up and tells the server that hey there is a slot here show an ad and the server comes back and says okay this is the ad exactly not just new sign in anybody any page right you are browsing any page any ad somebody is now on this page as the page is loading it's not opening up because of something freeing up because of usage I have one question audience targeting for segmentation so the base data which you use have you created a single customer view something like that to actually create these segments how do you come up with these segments which you use for audience targeting so there are various we have been doing advertising for quite some time so we know what segments are of interest to the advertisers right so the advertisers generally tell us what segments they are interested in that kind of tells us gives us a clue on what are those various cuts right and then based on that and that's one the other thing is we look at the traffic in each one of them and then determine hey there is enough traffic here that we can give guarantees if there is not enough traffic then we say okay there is no traffic we cannot give guarantees let's go into the bidding world and then let the advertisers bid and win those slots if required we know the segment cuts right and we know what data in the past was there in each of these things right based on the volume of data in the past we say which are the ones where we can give reliable guarantees and which are the ones we cannot and then we drop the ones where we cannot when you are relying on the past data the one thing I ask is like at a point of time different segments and different ads are competing for different segments isn't the performance of the ads dependent on the time seasonality for example like you know winter season whatever offer a sale is going on so at that time some particular ads were performing good but then like you know the time moved on the entire history might be redundant so how do you handle this variation good thing so what I was talking about is a model called addingma so there are lots of additions to this one of them is if you understand right for the sales forecasting not really for the mapping part that's correct so mapping doesn't change over time whether whatever is a seasonality the volume of traffic in each of these might change but the mapping what segments we have is let's assume they are fixed what arima does is there are seasonal variants of arima which can take extra sales events also into account so they are called sarimax seasonal arima with exogenous variables right so essentially it's just arima with a little complexity but that's how they are handled seasonal arima models actually take seasonal patterns into account seasonal even the differencing that they do to get stationarity is seasonal in nature and that's how they work that's something which we cannot do because booking system needs the forecast to be able to book right so the segments have to be before booking exactly we have to deal with all possible segments truncate them but still the number of audience segments is large so that's why the advertiser data you are talking about am I out of time okay I can thank you