 Today, I am going to talk about mission, okay. Now I am going to talk about applications of mission learning in online advertising domain. So briefly, here is my outline. Although some of the topics are covered in earlier talks, I think. It starts with a lead for mission learning in big data in general. And then online advertising industry, some of the challenges in that domain. And then I will be talking about some of the mission learning applications in Power Media. So finally, I will be covering some of the challenges aspects in that domain. I think the way it is. Hello. Hello. Hello. Hello. Yeah. So I think by now everyone knows the importance of mission learning. It's like, it's good for any number of applications like our day-to-day like we have a spam filtering. And the simplest application is Google is which does the question internally. So it can find all the similar things for you. Then you have recognition everywhere now. You have from face to action recognition, all kinds of recognition that have been embedded into various devices. So if you go to e-commerce domain, you have these product recommendations. Any other kind of like in the previous talks, people have talked about. So if you go to social networking, you have friend recommendations. So mission learning has become like part of many of the applications that are going right now. So why big data and mission learning in general? So right now if you see any application, it generates lots of data. And there has been used many advancements in various like if you take internet or if you take hard disk or anything. So there are many advancements. So because of this in recent days, there has been a lot of data that has been collected. So you could take advantage of this data in kind of learning many behaviors in any product. So it can help business in taking the product, coming up with good product ideas or taking any product decisions. In terms of coming up with what is the better way of selling the product or coming up with the product design. So it helps the more data you have, the more signals you have. So you could come up with more confident and accurate models. So in general if you see the more data you have, the more like corner points you can collect. And then you could just outperform that confidence even if you build any confidence out of them, but without any data. So if you look at online advertising industry, so as you know like there are various cases which online advertising happens. Like it could be in various ways like search advertising. In the case of search advertising what happens is that you get these concept results. But any query that you find in any search engine, you get these concept results. Which says like relevant results to whatever query you should be doing. So these are a way of income for the search engines. And then if you take display advertising. So display advertising is about any web page content. Along with the web page content you want to display the advertisements related to that are related to users. So that's the traditional display advertising. And then you have social advertising in the social networking which has emerged recently. And then you have different mobile advertising and video advertising. So each of these fields have different kind of problems and the base in which the advertisements are shown is different. So due to various data that you get. For example if you take mobile you get a lot of parameters that you don't get in the web page. Normal desktop. So each of these fields are different in each of them. So what's special about online advertising industry? Because of the problems that it has and the different challenges that it has. A new field of science has arisen. It's like completion advertising. So in all of this the central goal is ever finding the best match between the three entities. Like the user, the advertiser and the publisher. You want to find the best balance. So everyone has their own goals. You want to find out what is the best match across all these three entities. And it comes at the intersection of various interesting fields like machine learning. You have economics, optimization, distributed systems. And all these fields having the problems such all these fields. So it is a very interesting domain. So if you see only the display advertising technology domain. They are right from the end of advertiser to the publisher. So towards your left hand it's advertiser than here. This is the publisher. Between the advertiser and publisher there are so many companies which perform different roles. So basically these companies try to work on this type of advertiser. They try to optimize any return on investment for the advertiser. These companies try to work for the publisher. Each of them have different roles and each of them are trying to outfit each other. So when any advertisement is being shown. So these companies may or may not be directly called. But each of them they are their own goal. Like for example in the case of advertising it is working on the side of publishers. So it falls in this end. So what are the important metrics that are present in the attack? So usually what metrics usually people care. So they include impressions. Impressions means what is the total number of times a ad is shown. That's the fundamental metrics. So that is what is required. Then you have unique users. Which is the total number of distinct users that an advertisement is being shown to. Then you have various ways in which ultimately the advertising cost or the revenue model happens. It could be various ways. Like it could be simply based on just how many times the impressions were shown. So usually it is referred as savings. So that is calculated based on cost per every thousand impressions. Then you have click list cost on action list. So action list like if I am showing an advertisement. And I would create you only if that advertisement gives me a conversion of the product. Like if the user is buying this product. And so as I told in online advertising industry, everyone has their own goals. Each of them is having their own goals and different challenges. Like if you see in general advertisers want more return on investment. And if you see demand side partners or any ad networks, they try to optimize clicks. Or they could also try to optimize the return on investment. So SSPs like the ones which work on the side of publishers. They want to try to get more easy deals. So each of them has their own goals. And sometimes these goals could be complementary to each other. If you wanted to do any of the machine learning before going into the problems. I would just like to give you an idea of what kind of data we have. So in general you have this kind of data. Like you will have a user identifier like the cookie that was mentioned in the area now. You may have these details from that. The location. The officer. What time of the day he is visiting that ad. And the language and so many other details. Like if you go to the advertiser details you will have what are all the entities that are involved in displaying their advertisement. What ESP, what advertiser, what is the category and price. So along with this data there is some extra data that is being collected also. Like there could be some part in data providers which could give an idea about the user. So they would be getting that idea based on different ways like offline data. It could be bank transactions or anything. So that is like an extreme case. But it could be just like a virtual identifier where first of all you somehow are having the data from a large retail provider like Target. So you could get this kind of different data points. So before going to further applications. First I will give you an idea about general machine learning applications in ad technology domain. So one of the famous policy is the click prediction. So the goal in general like for any advertiser. Like if they are trying to see that their product is being visited by as many people. So they also want to see that they are landing on their web page. They do not see the integration ad but also they want to make sure that their web page has been visited. So in those cases they want to try to optimize the clicks. So the problem is that you know in history what kind of impressions have generated clicks. And what kind of impressions have not generated the clicks. So this is a classification problem basically. Here you have that data, positive samples and negative samples about the clicks. And you want to generate a model. It says that given any impression whether it has a high probability of click or not. Then in the case of search, first search when a search query is fired. So in order to get the sponsored results there is an option that happens in the behind. Like first of all Google is trying to find out which sponsored search results it has to show. Then in the background it tries to find out what are all the advertisers which are trying to bid for that keyword. So first of all two advertisers are trying to bid for the same keyword. Then how should they choose the bid such that they can bid. And from the case of Google it's like how can which advertiser he has to sell it. So that ultimately he can charge the advertiser. Because there is also property of whether that ad is going to be clicked on. So there are two problems. One is about selection of the bid from the case of advertisers. And in the case of the search engine perspective it's about selecting which advertiser. Like which advertiser will give you the best ad in terms of the clicks. So that's another problem where learning is kind of useful. And in general for ad networks are some of the advertising entities. Usually what happens is that in the first visit the user case about the advertisement. But as you show the same advertisement again and again you might not be interested in the advertisement. So the number of times you show the ad is called as a frequency. So you want to find out what is the optimal frequency at which you want to stop showing the advertisement. So for that you kind of can see the historical later. How like what till what time he's kind of giving the conversion. Users are giving conversion for what kind of ads. So this could be useful for kind of finding out what time I should stop showing this advertisement. So that he can concentrate on showing different advertisements to the user. So another problem is build up the management. So like I said there are each different kind of inventories. Inventory plays like the impressions of the kind of characteristics of the impressions. So different characteristics could be useful for different types of advertisements. So you want to find out what is the optimal patch between the right set of impression characteristics and the advertisements so that you can get high returns. So that's the problem of yield optimization. So for this you want to allocate your impressions such a way that you find out the best advertisement for every kind of impression that you have. Then give me an example of that impression characteristics. So like I explained in the previous slide the data points that we have. Like the browser, the geolocation. So basically on the fly first of course you get a request saying that this user has visited this webpage. Now you want to find out what is the best advertisement that I want to show. And can I show the other advertisement now. I mean you have multiple advertisements. So you want to select the best one so that your overall yield can be tax matched. Basically you have let's say thousand impressions and you have five advertisers. So how do you want to allocate these impressions in such a way that you can get the best revenue. So that's kind of optimization problem. Another problem is like the user segmentation problem. So here like in online domain for this advertisement you practice this by looking at the cookie. This gives a virtual identity. So that's the user idea. So now from this virtual idea you want to get a profile of what he is. Like what are the interests of this user so that advertisers can target the users properly. So basically for every user you want to find out what audience segment. So audience segment could be like whether he is an rich person or a poor person. Or whether he is interested in kind of sports or movies. So these kind of interests you want to somehow get a profile of the audience. So that the proper advertisers could be targeted to the proper audience. So that's another problem which involves kind of tracking the users from various websites. And they want to find out what are these kind of clustering saying that this user is visiting this website, this website, this website. And all these websites are belonging to the movie sector. So that are the categories. So that means that his interests include movies. So that's the user segmentation problem. And he also has this problem of forecasting. Because in general for suppose you are planning to, if I have a publisher say I am Yahoo. I want to make sure that for the next year event, the Super Bowl event. I want to make sure that all my inventory is big sold. And also that now itself I can make the deals beforehand and then I can plan them. So for that purpose we need to find out what is the amount of the impression that I am going to expect. So we have the historical data points like over the past years data. And now you want to find out how many impressions you are going to get in the next event. So these are the kind of problems that you deal in the online advertising industry. So now I talk specifically about what Pogmatic does and what are the kind of programs. I use examples on what Pogmatic does. So like I said, Pogmatic works on the publisher side. That is we try to maximize the revenue for the publishers. Or it could be certain assets. You want to fill as many impressions as possible with any advertisement. But you don't want to lose any impression without showing any act. So there are many goals for the publisher. Each of them could be different. And we try to optimize these goals. So for this purpose what we do is we are integrated with many demand sources. That is like advertisers and entities. Which kind of provider, which work on the side of the advertisers. So they are the providers for advertisements. So we kind of option across all these different sources. So for some of them which we get the bits on the fly. Like what is the value of these impression characteristics on the fly in real time. So that is called as real time bidding in online advertising. So that is usually being done by demand side partners. What they do is whenever an impression request comes. We get the bits from all the demand side partners. Say that okay for this impression characteristic I am going to bid this value. And now we try to find out the packs of all these. And then just give the best advertiser who is paying the hydrogen. So in the case of ad networks which are like the older technology stack. So they don't do anything in real time. They kind of provide us the historical aggregates of like values of revenue or the impressions. Like for example in the last two days you have got this much amount of money. But they don't provide you exactly like if the impressions are coming from India. And if they are coming specifically from Pune how much is it valid. So those kind of people are not available. In that cases if you want to do a fair auction then you want to predict how much is going to pay based on the historical data. So if you see people personally this is how it looks like basically it's an example. So let's say these are like four channels in which the auction happens. You have some direct deals with the advertiser. There are agency trading this which are also ready. And then there is ad networks which is it for which we don't have a price available. And then you have a demand side platform for which you have a live auction with a live wage. So what topic it does is to kind of find the best advertiser which can give the best price. So like I told you in the case of ad networks we don't have any price. So we need to predict the price for every impression. So for their purpose we have developed something called as ad price prediction engine. So the goal like I said is to maximize the demand for the publisher. Because we don't have the granular data this is required. Like I said in previous example we don't have for every impression what is the price that needs to be paid. So like we have historical aggregates of the impression element and price is paid. And now you want to find out how much each characteristic would give you. So that you can optimize that event. So one of the ways in which this was solved by using the user frequency approach. So we noted over each frequency generally ad networks pay less than one as the frequency increases. Like you will have the first frequency they pay you higher and as the second frequency they pay you lesser. And as the frequency increases the ECPM might drop and at a certain cap they will not pay you higher. That means that they are not interested in showing any advertisement. So the problem looks like this. Like you have the historical aggregates. So this is what was reported and you have historical impressions. Now what is this ECPM I for every frequency. So this is a formulation you have a equation and you have some constraints. So ECPM is like the cost for every 1000 impressions. That is CPM that is cost per delay. Like I explained this. So this is the way how the advertisement wants to pay you pay the publisher. Frequency is like what is the number of times that user has described as an advertisement delay. So that is the easy. So this can be solved by any optimization software or some genetic algorithms. But getting the solution of these is yes. So this is one of the first. So there are some great features. You have the feed for that. So they have got different musicals. So that is altogether a different problem that is about product detection I think. What you are talking about. So this doesn't account to that. So that is altogether a different problem which is totally different. So this is different. So here you are trying to find out the optimal price by a big implementation mechanism. So those techniques would be used and you could factor out those data where you find out that there is an optimal price. So in this case there are no clicks involved. It's just based on the impressions. This is based on only CPM. If you are paying for CPC basis, then you are paying about clicks. So another problem that we kind of try to solve is finding out what are the impressions characteristics which have higher demand. That is what is paid most by advertisements. And then there could be a lot of actions that can be taken by finding a base. Whether if certain impressions characteristics are very high value on Google. Suppose you know that certain characteristics are very high. Then you could try and get more such kind of impressions on your platform. And then you can get more revenue. Another way is that you could also say that this kind of inventory is not at all giving me revenue. So I will stop taking that inventory and then I can save my infrastructure. So this kind of business decisions can be taken by kind of getting these kind of predictions. So again for this problem also you have the historical data saying what kind of impression was paid by what advertisements are and how much is paid. So if you want to find out whether it is high value or low value, you can use this data and then probably some classification algorithm. Simple classification algorithm. You can start a decision tree algorithm. You can find out like you can get a tree which kind of says what kind of impressions are having high value, what kind of impressions can be classified into low value. And this can result in a kind of street structure which could be converted into rules. So having this in terms of rules is advantageous because it could be easily kind of limitable. That is one simplest advantage because you are dealing with online advertising where you want to make additions very fast. Because you saw in milliseconds you want to do this in any millisecond switch. It is very important. So for example like as I said, so it is an example of how it looks like. It is that you have and every time you kind of make a decision whether this attribute belongs to this kind of node or not. And then you kind of take a decision. For example, this path if you find that it has this kind of formation. It says that this is a high value impression and other kind of impressions are not high value. So the problem that we have is about detecting the outliers. So because it is an online business and we have a lot of data flowing in the real time. And if something goes wrong or if there is a drop in the revenue or drop in impression because of some reason. And if there is some problem somewhere you want to take action so that the loss that is incurred is as less as possible. So for that purpose you want to kind of find out if there is any abnormal data point in your data stream or not. So that can lead back to the traditional anomaly detection problem where you have historical continuous stream of data that is flowing. For example, you have the red again or like the wind rate and if you are having an auto cluster where you want to find out if something is wrong with the auto cluster matrix. Then you could use these kind of techniques to find out the anomalies in the and then you can just take action so that any revenue loss that you have can be decreased. So for this purpose you could use like simple statistics approaches. Like you can just have this method of worship. You can calculate the mean and sigma that you can just have based on like mean plus or minus two sigma. And then just see if it is falling out of that range. And that you can see like based on the recent trends. I have a question for you to come at a point where you could stop and I will ask a question. You are telling me there is a set of techniques. There is a set of processes equations that we are using that's what is used in the industry and you process it on the data. Now the point of machine learning is that you want to be able to use the machine to learn to be more efficient or better. So I am using all the algorithms and statistical methods if it doesn't translate into better business. So it has to be a loop. Somewhere this impressive loop. Now we are talking about the set of techniques here. This is that. We are living in bad data processing data better than what you have. So you at some point want to cover that. Where you are going to tell us how by using machine learning at tech and your business got better. Yes, I know that you use better algorithms, better processing, more protocols, more signals, you get better. That's good. But the point of machine learning is it's a superior approach because we use the feedback in the system. So that's a critical point we would like to move. Yeah, so I don't know how you understood. So basically for this kind of, like you go from the historical data, for example in this case, you know what historically, you have seen that these kind of equations are giving you good revenue and these are not giving you good revenue. So you can stop taking, for example, you decide saying that these kind of equations are causing a problem to the business. That you think too much of infrastructure costs but not resulting in any revenue. So there you are kind of helping the business by learning from the data. What patterns are the ones which are causing giving you the good revenue and what are not giving you the good revenue. So you are saying if I apply for algorithms and we have better business. Yes. So you are saying which means that I just have to come up with more about the work to train and I'll get cleaner by data. Yes, up to certain extent yes. That's what it's like to do, right? It's trying to look at the data and find out what is the best station that needs to be taken. So is the AI aspect when there's a feedback? It's not all the cases you have a feedback. For example, in the previous, so even in this case of this tree, first of course you train it on the historical data and then you validate your approach on new data set. Then you kind of see whether it is performed well or not. If it is not performed well, then you kind of kind of modify your parameters. So you kind of take a signal and then use that in your previous modeling. So over time you can improve your model also, right? In the first time you can't get the best model. So if you kind of see how your model is performing your time, then it kind of tries to adapt to the data. Whatever latest data you have and it kind of tries to adapt to the data, it gets the new model parameters and then you can get better results. So you can't come up with the adaptation part? No, not specifically. Okay. Then just cover it and see. So this is like one example where like NMA detection could be useful. Suppose you find that there is a certain deep in the revenue data you want to find this kind of anomalies. Then you could kind of use these techniques and then perform the corresponding team so that you can take actions and save money to the business. So some of the challenges that we have is that there are various kinds of impressions and there are different ways in which different parameters are needed. So sometimes one method might not be working for all kind of data or all kind of advertisers and publishers. So you might have to deal with different types in a different manner, trying to fit the best models for each of them separately. So that becomes very challenging because the number of publishers and advertisers that you're working and the combinations that you work can increase over time. And then you kind of have to find out the best way to analysis. And whatever model that you build, sometimes might not work well all the time. That is because of the void type, too much void type in the data. Suppose if you have to avoid the forecasting problem, like in the biggest example where you have the nifty prediction, you had too much noise in the data. So if the same kind of pattern exists, sometimes you will not be able to do any good predictions. So if the data becomes too correlated, then it becomes difficult in terms of fine tuning the parameters are coming up with new methods. And as we deal with very large scale data, we usually work with big data technologies. And then we kind of run all of the algorithms on mostly on big data platform. If first of course they're fitting in the memory, then you could use it on standard and missions. But if you want to kind of use more data than probably you depend on big data. But at least by getting the aggregated or sampling data, this will need to depend on these big data technologies. Coming to the implementation part, like I said, we usually either use the aggregated data or we kind of sample the data. So as to fit the data into the local mission or any desktop mission. If there are cases where you could probably just do by sampling, because you might not cover all the cases, then probably you might have to run it on the company data. So there using the distributed data, this is very helpful. So some of the tools that we use are found on HPEs and we use R and MECA. And there are many other technologies that are available, like these are upcoming, but they're still inexplicably like you have SPAR based methodology, which is like the ML is the category which is based on this HPE. We aren't exploiting it, but I think that's one of the good directions in future. And then further, if you want to use R un-distributed, again, you could also, there is one, R Hadoop. Again, these are like in the ecosystem. Although we haven't really tried it out much, but this could probably be useful if you're dealing with the distributed problems. So it's basically... Is the distributed app or is it rival for M2 Hadoop? So what it tries to do is it kind of, first of all you have, it tries to divide your task into multiple, into Hadoop framework, and then get the result to you in a single machine. So it internally divides the task... I think Hadoop works on the map, please. Yes. R may not necessarily be one of the... Yeah, so it tries to divide, for some of the algorithms, it tries to divide your problem into distributed manner, and then tries to exclude each of these tasks in separate machines, and then tries to aggregate the final result in it. Wherever it is applicable, no. What is there? Any other questions? In what context and how do you use cases? You score points in Mabu, R and R. You do it differently for different use cases. Yeah. Also because kind of evolutionary nature, sometimes, yeah, use case depends on it. Like if you want to prototype or if you want to see what algorithm works best, it depends on the algorithm. For example, Mahoud has this frequent fact of growth algorithm. If you want to apply a large scale data, you want to find out what are the kind of attributes that are going together. So that could be useful. Or if you're doing a distributed case, that's right. So Mahoud has good implementation there. And for R, it is like if you want to do any simple prototype, where you have this data set which can fit in your machine, then you can do those experiments. So after this, please tell me how can we move on? No, it's used as I have been trying to use. So it's like if you're going for big data, still no leverage is completely having all the features. Like Mekai, you see there are so many features in big, but still it can't run in a distributed manner. And R, you can prototype again, it's not distributed. So Mahoud, it has distributed, but it still doesn't cover all the algorithms. Each has its own pros and cons. It doesn't have vast coverage of algorithms. And also it's not like completely adopted. I mean there are still, it's still in early stages, you can't say that it is completely has as good capability or algorithms that have built in other like R and R. The support part is also, although it's still ongoing, like they are bringing the support part as part, et cetera. Are there any combinations where these could be, where the technology could be like R was in the R? So basically if it is simple, if you want to do the prototyping, I think R is a good point to start. All that Python has is like a library, so you can start it like for prototyping purpose, for learning purposes. Can you work with the technologies? Yeah. Technology is one. The last one. The last one. The last one. Yeah. Here you go. You are seeing the vehicles like that, something like that. Yeah, so there are still other kind of development. So alienation. Yes. So this is all like over time we have a world, like I said, so initially we started as the data goes. Now it's at this age when you want to take applications in the R. So for that, yes, and other real-time training is going to work on that. What would you like to do? Visualization. I want to do my solid thing. So visualization, I think, if you really interested, probably D3.js is one good framework, I think. So that's everything. Thank you. Thanks so much.