 Okay, good morning everybody. So today I will be talking about, so today I will be talking about content marketing and natural language process. And I mean this is, this is a broad outline of the talk where I will first go over what content marketing is and then talk about some key challenges in content marketing and talk about some key challenges in content marketing and particularly around content velocity and channel explosion. So the content marketing is actually one of the recent topics that are going, a buzz in all the digital marketing scenarios and this has been around for about, I mean the buzz has been there around for about a couple of years and one of the key questions that people always think about is what exactly is this content marketing and how this is different from the traditional marketing that we always did. So there are several definitions of content marketing and I tried to pick up one that was given by content marketing institute, which is essentially repository of different materials around what content marketing is. It essentially defines content marketing as a strategic marketing approach that should be focusing on creating and distributing value content to clearly define audience to generate profitable customer action. Well for anyone who is in this area of digital marketing or even in marketing for that matter, this doesn't sound very different from where the traditional marketing has always been. A traditional marketing has always been about creating those messages for a specific audience that you want to target. So what are the differences between a traditional publicity or a traditional marketing versus a content marketing? So in a traditional publicity or a traditional marketing people always talk about your own brand and your own I mean the promotion that you give across as part of your brand. But unlike that the content marketing is different the fact that it tries to get the brand connect with their final audiences. In other words to first identify what exactly are my audience interest and how can I cater to their audience interest such that my brand has a brand I mean that such that the customer is loyal to my brand. So in other words so let us take a quick example to make this make this difference more clear. Say I am the brand Fitbit. So Fitbits I mean if people are aware these are they are brands that create a lot of this fitness devices ranging from a simple watch to I mean ranging from a simple step counters to a fitness watch or something like that. But if I was a brand Fitbit then something like on the left is what I will do for my traditional publicity. Where in I'm going to say what are my different products that I have under my brand and how they are how they can essentially I mean what are the different salient aspects of each of those products and how can my customers use that. Well that is traditional publicity. Now coming into content marketing but the thing that I want to establish with my content is I want to go directly to my audience directly to my customers and say and give them what they are looking for. Say for instance anyone who is interested in my product if I was brand Fitbit would be someone who will be very interested in fitness and some things like some healthy tips for their summers or some exercise tips or some regular workout exercises and so on so forth can be very useful for my audience. So as a brand what I get from here is whenever the customer whenever my audience is or whenever my customers are going to think about anything related to fitness I want them to be reminded about me the brand Fitbit. So that is the sort of a loyalty that you want to generate based on this content marketing. Well people might always say I mean I can I mean if you go there are another set of audience there another set of group that essentially say that this has been our marketing always before that there are different stages that you do as a part of your content marketing that essentially you first strategize wherein you decide in the previous example if my audiences are looking for a particular thing about the fitnesses or diets or anything like that I first need to understand what are the different things that I can post about then create that content in this case in the previous case it was about the healthy tips for their summer and then optimize that for different channels and then publish this but once this is published then you also need to do what what is known as promote where in it has to reach to all the audiences from different channels and finally you measure how the audience have received that so that that feeds back into your cycle of content creation. Well this is I mean if you look at this cycle and the previous example and everything put together people might again there are there are audiences that say that okay so what is difference what is the whole difference in this whole concept of content marketing and this has been the traditional publicity always well the answer is this so we want to essentially give our audiences as a brand what they are looking for so that they can have an inherent connect with my brand but with the increase in the number of digital channels that are out there and this is necessarily the sort of the variety of content formats that these canals essentially bring into the table this provides with a plethora of different formats in which the content can be delivered and this makes the content marketing more and more challenging. So before we move forward let us just remember two things one is in the in the world of content marketing you're going to create content that is going to resonate very well with your audience with an objective of increasing their brand loyalty but in doing so you also need to understand that these content needs to be delivered to them in the way that they are consuming that and in this world there are those different channels and different mediums through which they consume so there is a necessity for even the same content to be delivered differently based on how that is going to be consumed by the end audience with that in mind let us look at the two key challenges that I will have if I was a content writer in this era of content marketing one is I'll have to handle the velocity in which these digital channels are asking for this content earlier I mean in the now in the pre digital era the cost of publishing was always heavy but today publishing a particular content publishing a blog post or publishing a Twitter post is so I mean is so easy and people are always looking for new content so there is an inherent speed at which these content needs to be delivered to our end audience so that we continuously engage with our customers to with as I previously mentioned there is this whole proliferation of different channels where and there are a lot of different channels that have come up like there are these mobiles and there are these web channels and even within web there are these blog posts that people can post there are these there are these different the bigger websites of the infographic that people can post there are these social post social medium platforms where each of these platforms come up with their own unique requirement of what kind of content they essentially need now so as a content writer I need to both handle the velocity at which I have to deliver this content and as well as optimize this for different channels so here is where as technologies can help and these are the two things that I will essentially touch upon through the rest of my talk and I'll talk about how natural language processing and some simple tools in the natural language processing can be put together to aid in each of these layers so first let us talk about handling the content velocity with content ideation and content curation so what this means is as a content marketer this is one normal this is one scenario that you might often end up in certain you're looking at a empty boat and thinking about what is the content I'm going to post that will best engage with my customer this becomes more important because because of the different channels at which you're creating and as a fitbit or a brand fitbit or even as a brand adobe that I come from so it is going to be difficult to come up with those content that are not necessarily your marketing promotion because marketing promotions are something that are natural content that you get from your marketing team from your sales team but these are of content which is something that I can directly go and relate to my end customers so these are the sort of content that are not going to come easily so one place where technology can help is in finding some content that are currently trending and that might be relevant to the brand and that can resonate well with the community so as there could have been I mean a lot of different topics that would have been talked about the different data analysis and the data collection technologies one thing that we can easily stay here is with some simple data analysis and a few natural language processing out of the box tools we can easily find these things let us see how for this we'll have to start with we I mean let us just start with three different aspects one is I want to touch upon what is currently trending and then I want to talk about what are my brand interests and finally I want to talk about what are the community what is my community interested in so as a data scientist and as people who are living with data day in day out you always know that there are different data that can communicate each of these different aspects let us see what could be some good data candidates for each of these categories of data first to capture my community interest I need to go and tap into different sources of data that where my communities are expressing themselves for a brand say I mean I mean for a brand like say Adobe or any other brand there are these different social pages that they might own where people come in and they often come in and comment upon different posts that they have had similarly on their own own website they can have some analytics around what are the pages what are the information pages that people often visit or what are the blog posts or what are the blogs that people often engage with and these sort of statistics are always available in terms of the owned content that any brand has that is easy to lift these data from these places and the by data here I mean they essentially the content that has gone out there and what are the kind of engagement that each of these content has received so be it in the social medium we can get those social posts that was actually published and get the reactions that are out there for a brand this is easy because this is not the data that you go anywhere out but this is just the data that you own then really into the brand interest itself in every brand that needs that has a digital presence they also have what is known as a social monitoring wherein they have a set of keywords that they go into the social platforms with and get those different feeds that have been talked about these different keywords so this is again some data that is directly available from your own social marketing department in our brand and then of course there are this different online aggregators that contain the data like so there are these bitlies or the tiny URLs and those sort of link URL aggregators that collect data from multiple sources and essentially fetch you with what is what could be interesting for your brand finally to again go into the current trends we can again go into these different sources that are publicly available that talk about what is currently trending now at the end of this exercise what I might have is a set of content data like what has been posted or what is being posted about my brand and some activities around that if it was a social page I might also collect what are the number of likes that that particular post has got what are the now how many times it has been retweeted and so on and so forth it was my own page I can always have how many people visited that page and how what was the bounce rates and various that analytics on top of that content with this now I can do some simple analysis on top of this and I'll go into those textual content and I can simply extract the key named entities out of that and extract a few noun phrases and various other entities from that particular content this can I mean for this again today with some simple natural language processing in the in practice has grown as grown such that you can have several out of the box tools that helps in this one some of the popular ones like Stanford passes that this given an input text you get all the different interesting noun phrases and entities that you can have from that now comes I mean as any data scientist we need to put this into a representation that can be useful for our further analysis and one popular out of the box thing is again call if idea the term frequency and the inverse term inverse document frequency which essentially account for how often a particular term is occurring in a particular document as against how common that is so this essentially tries to bring up the importance of certain keywords with respect to a particular document that we are analyzing and essentially boils I mean essentially suppresses those terms that can that are appearing across several documents a simple example here is they suppose I just take an example of brand Adobe so if you look at Adobe as a brand say some term like Photoshop or some of those different products within Adobe could be very popular and it can appear in several documents so it essentially means that most of my documents are going to have something related to Photoshop but even for that matter say the brand Adobe is I mean if I'm looking at just looking to construct I mean get these feature mixes for Adobe it is also very high that Adobe is going to appear in every other document that I'm going to analyze and therefore it will have a very low inverse document frequency which is the second term here which will necessarily suppress that feature the reason is if I was again I mean I was brand Adobe and looking to create a content for my content marketing purposes I mean giving Adobe as an idea may or may not help in reality because that is really I mean that is not going to convey anything new for the content right so with this I mean for more these are the TF idea is again a standard the feature representation and we can extract that from out of the work tools as well now what we have is a feature I mean a vector representation of the set of entities that we have from our content and a set of metrics that indicate how well that has performed across the different places be the social platforms or be at the own website the analytics around that now what I need to do is I can essentially build using the TF idea as a feature I can build a predictive model with these different KPIs such as the likes if I'm looking at looking to have ideas around the likes in a particular social platform or even a combination of these different likes comments or shares or the number of visits that a particular page has got and so on and so forth and I can build predictive model using the TF idea of the other vectors and the each of these KPIs as my predictors and now I can go back and compute the importance of each of these entities it could be the beta coefficients in a linear regression or it could be any complicated I mean more complex feature importance is in a complex model than at the end of it I can essentially see which of these entities have been very important to capture the particular KPI that I have got. So what this may yield is sort of a set of ranks set of keywords that can go back and say what are the different idea what are the different topics that has resonated well with my community or what are the things that are currently trending and so on and so. So for instance let us just take this example of one of the ideas that has come up here it could perhaps I mean if I go back into my data and see where this term blur gallery or which is one of the ideas that has come up here appears I essentially can see that it has been appeared it has appeared in one of the Facebook posts are actually in several Facebook posts which has garnered a lot of likes from the community indicating that this is one topic that could be very much very well interesting for my community so that I can generate new content around this one so that I can engage with my community better. Now what we have had are different set of these topics that we have had from various places could be from community interest or what is currently trending and I can essentially combine these two different ranks because these are all these are going to be on the same scales of the feature importance of my model and then I can also have I mean if you look at it as I said Adobe or a term like Adobe which is sort of going to appear in every other content it's naturally ranked down because of the feature vector I mean the representation that we have used but then what I also need to do is to understand or come up with the third class of ideas that I talked about which is the relevance to the brand which can come in as a separate scoring on top of these ideas that I have got and this could very well be using some sort of a dbpd or a knowledge basis that you can utilize here as indicated in this particular reference here and you can come up with a ranked order list of these ideas. Now what I have is two ranked set of ideas or two ranked set of entities that can essentially otherwise called as topics on which I can build my next content on and I need to aggregate these two and there are some standard aggregation rank aggregation techniques that essentially combine the entities the two different ranks into a single rank which can be leveraged to come up with a final set of ideas. Now what I have here are a set of I mean set of topics on which my community will be interested in and on which there are some interesting trending articles that I can leverage to construct my next content and at the same time my brand these are also somewhat relevant to what my brand is about. If I was I mean all this data I mean just if you have not realized so far these are all sort of related to Adobe which is the company that I come from. So for Adobe it has been identified that say something around Photoshop or Photoshop user are some of the top ideas that people can start constructing on. So if I was a content marketer so I can probably choose one of these ideas or a couple of these ideas to build my further content upon. See for our course of our talk let us just choose two of these one is the Photoshop and the other one being the blur gallery and I as a content marketer want to construct some content around this. The next part of the talk you will actually see how we can essentially leverage the content that is already present in my repository to just quickly construct content given these sort of topics on which I want to build the content on. Before that to summarize this part of the talk so we have we always address the velocity in which the content is required for a content marketing purposes we need to come up with the content at that speed and there is a good chance that we might often run out of ideas and or we might need some new ideas that where we can use technologies to get in these new content or new ideas and putting together some simple NLP tools can necessarily yield us some good ideas by just collecting the data that we already own. But the challenge still reminds that we have these content ideas now I need to put in how I can essentially put in these content from my own repository to come up with my final content that I want to publish which will be my second part of the talk wherein we want to also address now that I have some content ideas and I have my repository of different content I want to essentially put together this content and optimize that for my different channels. But before that we need to remember the personas that we are essentially targeting here we are going after the content writers who are inherently creative individuals and they don't like a technology hampering their creativity. But at the same time it is also important for a technological assistant so that they can that can aid in their creativity which is what we are going to aim in the course of this second part of this talk and which will cover some part of my work as well. So for a content writer in this era as I previously mentioned about the different challenges for a content marketer the challenges being they need to create several variations of the same content for different personas and different for different channels etc. And they also need to personalize these content for different demographies. So this is a non trivial task and again talking about the creative writers they can definitely do this on their own but all that we are going to do with our technologies is provide them with a platform to start with rather than starting at ground zero. Now getting back into our previous example so we chose that we will probably write around the topic of Photoshop and blur gallery so which is the sort of an input that standard content assembly system can start with and now we can go in to our own repository and collect different pieces of content that have been written in the past that are related to the topics or ideas that we have had. For this we can easily leverage a loosing based index or a solar indexing. And another point to note we can probably try this on our own repository and get away with any issues of plagiarism because it is at the end of it it's going to be the brand owned content. We should not probably try this across any other content because that can always run into the risk of popping or plagiarism which is not something that that is not part of this talk and I don't advocate that. So as we do this it is I mean the whole underlying assumption is that we have a whole bunch of content repository that people have used in the past and they can necessarily use that to build their next content on. So now when I issue this query into my repository I get a set of relevant content fragments that can necessarily yield I mean that necessarily are relevant to what the current topic that I'm trying to build upon. Then I score the relevance of each of these content pieces to the topic that I have that I was essentially beginning to construct on. So now I have these different scores that come and say I mean this can again come from a standard index indexing engine that gives some sort of a relevance to any as the relevant results are coming in. Now here comes the interesting thing. Now I'll need I mean I have these relevance scores of how relevant these content are to my topic that I'm looking at. And now I need to put these content together into a final piece that can necessarily be given to my content writer so that we can start off this next content from here. Again note that probably we can stop here and just give these set of content drag I mean the pieces of content to the content writer and they can still build from this. But this is while this is also useful. Oftentimes if the topic is pretty wide and pretty large, one for a larger enterprise that is a risk of essentially coming up with a lot of such content and so there is a need for a further optimization that can necessarily put these content together. For that we can have a greedy algorithm that necessarily chooses what is the best content given that this is the topic that I'm trying to build upon. So for that there are one such greedy formulation could be an algorithm that is I mean an optimization function that optimizes for the relevance that I have gotten from the search engine or the indexing engine that signifies how relevant each of these content is to the topic that I'm looking at and also optimizing for how diverse these content is going to be as I start building this content. This is sort of important in the context of building these content or in the content assembly framework. The reason being if we are going into a search engine or if we are going into Google and typing some topic, we are absolutely okay if the first three or four results that we get are so similar to each other. But however, when it comes to say some notions of content assemblies or building up these content, you would often a content writer might not want so much of redundant content going into his own pieces. Therefore, the platform that we give him to optimize also needs to account for this and give him a diverse set of content so that he can pick from different things and build his own content. So now based on that assumption, we can we are essentially optimizing between relevance to the initial topics that we have and overall diversity of my final content. So again, to have this, the similarities here, we can have there are several options available. One is sort of a simple cosine similarity between the vector representations or you can also have more deeper I mean more involved deep learning based similarity matrix here. This is something that is sort of I mean, we can still explore for different combinations based on what our final objective is. Now, going into the optimization function. So now, as I previously said, so there are these different relevant scores to my topic. And my first step is to select one fragment that will be most relevant to my the topic that I'm looking to construct on, which is in this case is the first piece of content that is for having a relevance of 0.8. But what happens is as soon as you select one, it is always I mean because we are optimizing between the relevance and overall diversity, some of these contents are going to have reduced score, because they are already similar to the one that I have already selected. So the optimization function it essentially continues in an iteration, and it then selects the next best one. And finally, it selects the other one. And I can just continue this iteration to a point where all my I mean, I have the set of content that I need to provide with an initial platform. And then I can just assemble them together into this sort of a content that I can give it to the content writer. Again note that what we have done here is necessarily picked up different pieces of content from different parts of my repository, and system together by optimizing a few for a few parameters, so that for a content writer, he doesn't have to do this mundane task of going into his repository looking at multiple things and bringing them together. Of course, these can give him some specific ideas and he can go in and modify these content further or generate his own using this as a platform and further and go on to the next stages of his content creation. Now, as you saw here, what you saw here was you had a set of content pieces and you had a greedy algorithm that was selecting what was the most relevant one based on an objective function. If you have I mean, you can have a similar formulation to also go through the to also select what will be the good short versions of Mike the content once the author has created. So there are celebrated approaches in fix summarization techniques, wherein there are I mean, which has been widely studied topic in the space of natural language processing that can necessarily help people select the key text from from the from bigger content and necessarily help them in finding a shorter version of these content. So talking about text summarization, I'm not going to get into the details of this broadly speaking, there are two different aspects of text summarization. One, which is called as an abstract summarization, which essentially tries to do what human beings do, which is constructing these new sentences, new text on their own. Well, while that is interesting, there are several open problems left out there and that is still not something that can be used right away for a content writer. On the other side, there are these extractive summarization that necessarily do something similar to what we did in our content assembly, where in ranking our existing pieces of content that the writer has written. In this case, I mean, the text units I mean, here we were considering a block of text for our optimization, but in the notions of the summarization, they will necessarily use a single sentence for their summarization test, and they can follow through a similar optimization function to to necessarily select the shorter version of their content. Now, to summarize this part of the talk, so we once we have some quick topic ideas that we want to construct our content on, then we can necessarily use our own repositories and a few optimization techniques to come up with a quick first version that the content writer can start, that start on that platform and necessarily use that for this further optimization. Well, there is one small thing that I will probably want to share before I open up for questions, which is now we have this, we have seen about how do we get new content ideas and how do we necessarily use our repository to put together this content or even optimize that for various sizes depending on the channel requirements. But another key thing that is a widely studied in the context of natural language processing is about content adaptation. Oftentimes in the marketing scenarios, particularly in the content marketing where we have to target the same content with multiple audiences, we will need to necessarily have the same content being personalized for different folks. Like say in this particular case, I have a more, I mean, I have a more professional, formal person and I have a college kid and if I have to send the same marketing message, the kind of references here are going to be different. So there are very deep studies in natural language processing like the references that are given here that essentially go and build a language model for the kind of demography that we are going to target and use those language models to go back into your own marketing messages and modify them and adapt them so that that rightly suits the particular audience that we are going to target. So these are some, I mean, if we are further interested, I would refer you to each of these references that can take you through what this state of the art in the content adaptation is. So with that, I would like to thank you for all your patience and I'll open up for questions. Thank you. Very interesting. So I have two questions. One is there is there a budget on the size of the content? Because I didn't see it in the optimization function. So the optimization function is an iterative algorithm, right? So there is a notion of, I mean, budget, the budget could be in different forms. One is, there are, I mean, in all these greedy formulations, there are one budget is at the size, which could be probably be the something that is dictated by the channel. Or the other possible thing is if, even if there are no constraints on the size, at some point, you'll also have a concern or a cutoff or a constraint with respect to the quality of the output content that you get. And again, there are, I mean, some studies on how these qualities are measured and you can have to have a threshold beyond which you stop doing the right. The other question I had was, like you mentioned, since this is an extractive summary based technique. So what about the coherence of the content? So did you try to kind of, so I'm guessing that the content writer's job is then to kind of bring in coherence amongst this different content. But did you try to, you know, is that the other automated ways of kind of bringing in coherence into this? So coherence in the automatic text assembly is something that is, again, I mean, something that we are researching upon. But I haven't shown that as a part of this thing because that is still a subject of active research. And in this one, I mean, we just stop at the assembly and leave the bonus of coherence to the content writer. You're right in that. But there are some methods in the natural language processing field itself where people talk about the coherence in the text, which is what we are exploring in that space. Thanks. Hey, so I'm interested in knowing like very interesting talk and I'm interested in knowing like with these techniques. What are the like, how did your content marketing improve? Do you have benchmarks for that? How were you doing before versus your competitors versus after? I want to Okay, so I don't have any, I mean standard benchmarks with respect to the content writers in itself. I mean, in terms of the content marketing phase itself, but in the latter part of this where we spoke about the content assembly, we just run a human experiment with the content writers within our company to see how these auto generated content look like, how relevant they are to the particular topic that they are looking to construct and those sort of things that we have very promising results, which is why we continued the further explorations in that space. Are you using it in production? So it is something that is, I mean, actually we did, I mean, the second part of thing was something that we did show in our, one of our customer meetups and it is in the roadmap. It's still not out in the product, but it is something that we are actually integrating into the product. Hello. Hello. Thanks for the talk. One thing I wanted to know because you're assembling new content from existing content, have you seen any risk of like stale content getting recycled again and again? How do you bring in the freshness, new stuff into the writing? Got the point. I mean, so the thing is in enterprise, the content writing is particularly for the enterprise copywriter. So one thing that they do very frequently is sort of, they need to repurpose a lot of the existing content for different scenarios and this one was targeted towards repurposing like scenarios. But of course, this is not, again, as I said, at the start of my second part, the content writers are at the end of the day, they are the creative individuals and they generally, I mean, anywhere where this creativity or even for marketers or everywhere, black boxes are not something that black boxes are not something that they really like. And so that is a very, I mean, so this is just for facilitating in their work and not for replacing them or automating every part of their work. So the idea even in this content assembly is that probably in some scenarios, they might just use that, use this assembly and sort of optimize this further, fine-tune that and use it for their final production or final publishing. But there could also be cases that they see this and okay, so I can write a complete content around this theme like the one that is here and they can create a new content all together. My name is Sudarshan. I didn't understand each and every step of the approach, but I just want to know why is the algorithm called greedy in the sense that I mean, if you look at the famous one, which is the minimal spanning tree, greedy algorithm, that it can be shown that nothing can improve that, that greedy algorithm cannot be better no matter what you do. So is there anything which is, you know, not a non-greedy algorithm, does this dominate every other approach is what I'm trying to ask, if you had a lot of resources? Okay, so one is the whole notions of this content assembly or the notions of pulling together different pieces of content together has been, I mean, it's just about two or three years in the NLP literature and still there are not any standard benchmarks that say that what are the different things. The reason it was called greedy here is probably in with one part, I mean, the approach is in itself, I mean, it does not try to optimize the overall flow or something, it just tries to pick one at a time. So that is and optimizes for that particular step. And in that notion, it is a standard greedy algorithm. And still, I mean, as I said, the area is sort of still naïve that I mean, very nascent that are not any standard benchmarks here to say that the greedy outperforms all this. Thank you. Hello. Thanks for a nice talk. It was a very interesting talk. So my question is pertaining to Adobe being a global company, how you are targeting non English speaking customers? What I understand is NLP is more mature around English speaking languages or the libraries are predominantly English speaking. How do you cater to non English speaking countries like European countries? So when I mean, as a research lab, I mean, when we develop, I mean, as you said, because English is sort of more established language in terms of processing and other things we start with that. But for several languages, there are also these some of these tools that are available. And most of the techniques that I spoke about here, it works on top of a standard natural language processing and on top of it, the kind of the analysis and the things that are happening out there or sort of not anything related to English or a particular language in a sense. So that is with that as a confidence, we just we first have our first prototypes in English and then we'll move on to the other languages and we explore how the language passing can be done. Hi, I have a question. Can you explain the relevance of this to Adobe? Basically, how what you're doing? And you know, can you give an example where this content marketing can be used? Yes. Okay. So for a prominent part of the Indian audience, audience, they know about Adobe as a digital media company, known for the Photoshop and the creative clouds of Adobe. But Adobe has about $1.5 billion business in digital marketing, where it helps to the entire lifecycle of a digital marketer. And some of the major, I mean, retailers like Walmart are our major customers. And in that, even that, there are several, it has a wide presence between US and Europe. And within those portfolio of products, Adobe has the thing called a content management system that helps in putting together these content. And in that, umbrella content marketing is also one of the things that I understand that, but I'm saying that this content that you're going to give, right? It's targeted. I understand that it's targeted and you're trying to make it more relevant. But why, why are you know, getting this content from all over the place? I mean, this content, as I understand, is advertisements or what kind of content are you looking at here? Is it advertisements? Is it ads? So these are more towards the blogs and those are of the particular content that we try to optimize here, more around the lines of blogs or the information pages or the documentation pages and those sort of places where they content, I mean, they create such content and repurpose that for different purposes. That is the sort of end use case. Hello. Yeah, Balaji, great talk. Just wanted to understand as you looked at that algorithm for detecting topics that are close to each other, but also you wanted a dispersion in between those topics, right? Some kind of diversity. Yeah. And when you look at metrics like cosine similarity, they actually go, I mean, they're trying to identify topics that are closer. In your example with the blur, for example, right? How do you make sure that the keywords don't get dropped? I mean, yes, your TFIDF will give you the most important words, but how do you make sure that as you do the dispersion between our diversity between the topics, some of the keywords don't get dropped and, you know, you don't end up with a dispersion that is too far away from the center. So as to speak, so when we are doing the topic ranking and those sort of things, we don't use, I mean, so we essentially just look at each of those individual features and rank out of them based on the KPI that we want to optimize. So the similarity was more in the sense when we are doing the content assembly where the similarity is computed more at the, the entire content block level, which essentially, I mean, it's not at the word level, but still at a particular full chunk of the content. And in terms of, I mean, in that case, we are indeed, I mean, losing the similarity not for essentially finding which are close to each other, but essentially if you look at that, let me get back to that. Yeah, so if you look at this, this is essentially a sort of a negative factor out here that accounts for the minus alpha here, which essentially accounts for how the diversity, which is what is the optimization for the diversity in some ways. So in other words, we are having, I mean, the similarity, I mean, we are essentially penalizing any content in this particular case that will be very close to the already constructed content so that we get more and more diverse content into our final