 many people at the end of the day, such a stimulating day maybe. So now that the organizers have put me between you and the jammer jam, so let me get started and try to be on time. Yeah, though the organizers has requested for the questions at the end, if there is something pressing, just let me know during the talk. And I'll try to end like 10 minutes before the schedule time so that we have plenty of talk time. So my name is Shoria Roy. I'm a scientist and a manager in Xerox Research in Bangalore. And the talk that I'm going to give today is titled, Transfer Learning in Practice. So there is two parts to the talk. So in the first part, I'll give a very basic introduction to the transfer learning topic, just covering at the breadth level, at a very, very high level, so that you know what we are talking about. I'm sure there are many knowledgeable people here who may be already knowing it, but those will probably be a repetition, so pardon me for that. The second part, I will be talking about that how we put that in practice. That's the focus. And that part has kind of been published in multiple conferences in Artificial Intelligence and Computational Linguistics last year and this year. So getting started with the talk, so let me go, what is the motivation for this talk? And what the first diagram that you see on your screen is a cartoon depicting a very, very common supervised learning framework. So we have a bunch of training data from some domain. We have a bunch of algorithms which can be trained using those training data to create something called a model. And that model is applied on a test data. I mean, again, most of you will be familiar with this. Now a major assumption which is behind this paradigm is essentially about this nature of the training data and test data. They are expected to be same. And speaking in the little bit formally, so the joint distribution of the data and the labels and some of these data and class labels, as is called in the supervised setting, are coming from the same distribution for both the training and the test data. And under that assumption, all the work that has gone into the learning theory kind of holds true. Now if that doesn't hold true, then it's a problem. So let me give you an example. So we all are familiar probably like sentiment categorization. Like it is like, again, probably a bit into that problem where customer, sorry, user-generated content is some needs to be categorized into different groups like positive, negative, neutral, or like on a scale of 0 to 5 or something like that. We are all interested. I mean, the commercial entities and organizations are interested to know that what are people thinking about their products. Now we wrote a sentiment classification algorithm, and we tried that with some of the products. As you can see, that books, DVDs, kitchen, and electronics. And this is part of actually a very, very standard data set used in this literature. These reviews are collected from one of the popular e-commerce website. Now what we did is that we, as I said, that we built a sentiment classifier for one of these products and applied it on the test data. And we got some performance metric classification accuracy of about 87, 88%. Now instead of training on the same data, if we would have trained it on a different product's comments and applied on that earlier domain A, we see that the accuracy gone down by almost 20%. And that is where lies the motivation of this problem, that you cannot train a model on a data set and apply it on a data set which is very, very different from there because then the performance degrades. So now coming back, now this is formally as in essentially known as that the basically what is happening is that this joint distribution of these two data sets are not matching. And as a result, the performance of the algorithm is degrading. And why this happens, well it may happen because there are many products and many domains are there. And I'll talk about the domain a little later. But many, many things get changed from training and test. And the other reason is that even within the same domain, the astronomical rate at which user-generated content is produced these days, it may be the case that the data has got outdated. The training data gets outdated and you cannot apply it on the data that is being generated today. So you need to again, what we call as a retrain the model. You have to again have a set of people label those data and retrain this model. And that's the challenge, that's a challenge that we would like to address here. Now summarizing what I just said here, that you have a domain a and domain b and you want to build the same learning system. But if you want to do this in the traditional way, you have to build the system every time from scratch. Every time you have to get the label data and do that. It is not only time consuming and expensive and have required involvement from human labelers. But at the same time, I mean you cannot kind of do this very fast if your data is generating or getting changed at a very fast pace. On the other hand, the transfer learning in contrast allows these domains and tasks and these distributions used in the training and test data to be different. We'll see how this is done, but that's the thing. And it is based on this fundamental philosophy that reuse the knowledge. The knowledge that you have learned while building the system for one time, use that knowledge. And this is something very similar to what human or maybe children do quite effectively. I mean learning to recognize apples can help you to identify pairs, right? Or learning to play cricket can eventually help you to play baseball with a caveat maybe. I'll come to that. And then, so that's essentially the philosophy that we would like to do here so that we do not have to every time go for this expensive labeling process. And again, this is, again, another cartoon. It's a very simplified flow of transfer learning just to key points. There is something called a source domain where there is a lot of label data is available. There's a target domain, but there's a lot of unlabeled data is available, which is very inexpensive to get. And maybe in some cases, there may be a very small amount of label data for the target domain is also available. These two things are fed into something called a transfer learning model, transfer learning algorithm, and which creates something called the predictive model, which can be applied now on the target domain test data. So now you have a model for the target domain test data without requiring to have a lot of label data as you needed for in source domain. This is also known as the transfer learning and again with minor differences, which is not relevant for this talk, is also kind of is referred to as the domain adaptation task. And there are other related terminologies like multi-task learning, where you try to learn for both the source and target domain together. Again, we are not covering that. But here we are trying to, using the source domain, we want to get the best performance in the target domain. That's the goal that we have. Now, and I should clarify the word domain, because this will keep coming in, that domain is not necessarily in our popular terminology that it doesn't have to be as different or as big as like health care is a domain and finance is another domain. In fact, it can be a very small collections within a particular larger concept of domain. For example, from one health care company to another health care company. I mean, just for this talk, we can kind of tell them that any two collections which do not match with respect to certain attributes are called as two domains. And what are these things? So let's just look into what is these transfer learning settings that commonly happen. So there are two key notions in the transfer learning. One I have been already referring to is domains. That is what I will clarify a little bit more formally here. And the second one is tasks. That is the next slide. Now, a domain is essentially consists of two components, a feature space and a marginal probability distribution on that feature space. So a feature space, again, from going back to the sentiment classification example, the list of all words or the vocabulary, as we commonly use in a text classification terminology, that's the feature space that you have. And the marginal probability distribution is what is the probability of a particular set of words appearing in a review. So that's the, again, I'm putting these things simply. But essentially, these two attributes define a domain. And as you now can see that, these two can be different between any two document collections. It doesn't necessarily have to be as popularly, commonly used terminologies like domains. Now, given that these two things are there, the feature space and the marginal probability distribution, so the domains can differ in two different ways. One is a feature space at different. What is an example for that? Say, web pages which are written in English and Chinese. Again, so they are different. You cannot build a model in the web page for English, web page classification, and apply it to a Chinese. Because the features are different, so you cannot do that. The other thing is that the marginal probability differs, right? Even though two web pages are written in English, if they are blogs on politics and religion, and you want to do certain classification within the politics, can you just take that model and apply it to the ones on the religion? Probably not, right, or not. So that's the other way two domains may differ. As I mentioned, the other characterizing attribute here is the task beyond domain. So task is, again, consists of two components, a label space and a function. A label space is the label, the class labels that we would like to assign to different objects. For the sentiment categorization, there were positive, negative, neutral. These were the three labels, right? The other one, the function, which is essentially we are trying to learn, given an unknown test example, how can we say that which label category it belongs to? That's, and again, probabilistically, it can be written as probability y given x, that is given, given an x, x, or given a feature vector, what is the label, the class label, that you have learned based on the training that has been provided, right? Now, again, so here also these two things can differ. If the label spaces are not same, the common example is these binary versus multiclass classification, where you have two labels versus n number of labels, and you probably don't know what is the connection between these two label sets. And the second thing is that where the label spaces are different, and as we call it, the decision function has changed. Essentially, the label sets may be the same, but the label sets, the way they are characterized by the feature vectors are different between these two cases. So under this setting, or covering these four combination, two cross two combination, there have been the transfer learning literature has been developed over the last, I would say, since 1995, where it was introduced the first time in a NIPPS workshop called Learning to Learn. So the vast amount of literature has been developed in this field, but and if I could like to summarize it in one slide, then there are like three very, very fundamental guiding questions are there, which is actually I have taken from this paper by very, very nice survey paper by Pan and Yang, and if you guys are interested in this topic and what would like to kind of get yourself introduced, this is a very nice paper to start off with. It appeared in the Transaction Knowledge and Data Engineering 2010. So the three questions are essentially what to transfer. So essentially some knowledge, and I have to kind of keep this word knowledge abstract, but it's specific for individual domains or tasks, and some knowledge may be common between multiple domains or two domains that are under in consideration. So we need to identify that what to transfer, and that's the first question that we need to answer. The second part is that how do we do that? And that is where all the algorithms and the techniques have been developed, and I will touch upon in the next slide just to tell you that okay, what kind of techniques that have been developed. And the final, so the first two questions has attracted probably the maximum amount of work in this field. The last one to answer when to transfer, under a particular scenario given two domains, the way we defined it, is it worthwhile to do transfer, right? And just to kind of give you an analogy going back to the previous examples that I gave, knowing the tennis ball is round, is a sphere doesn't help me to probably recognize what a sphere is, knowing how to do nice leg glance in cricket probably doesn't help me to play better baseball, right? In that case probably transfer is not going to help, and that is what the phenomena is called is negative transfer. And this is a topic which has received relatively less attention, but I'll tell you in the second part of the talk that why this is crucial when we talk about practically deploying such techniques and selecting or combining multiple sources is another kind of related question that should use adapt from one source versus multiple, but again I will just kind of touch upon towards the end just so that you are aware that this is a topic. So now on the final slide on the first part of this talk here, so the transfer learning literature again by very, very nicely categorized by Pan and Young that there are like different kinds of algorithms have been proposed. I mean like there's hundreds of papers have been written on these things, right? But this can be categorized into like in these four broad buckets. I probably will not go into the details, but just to give you the sense that what is this, what does, so what does instance-based approaches means? The intuition there is that when you have a source domain and the target domain, you have labeled it in the source domain, you want to learn a classifier on the target domain. Only certain instances probably in the source domain are going to be providing useful information about the classification in the target domain. Identify those, that's your knowledge, that's your what to transfer and how is that you, probably you give them higher importance in your supervised classification algorithm. I mean, and that's plenty of classification algorithm. You just, you re-weigh certain instances or up away certain instances based on which are going to provide useful information from one domain to another. That's essentially the principle of instance-based approaches. So maybe continuing on the second one, the feature-based approaches. So what is the intuition there? And again, I mean, you can read plenty of papers are referred in these, in the survey papers if you are interested in to know the actual techniques. So I'm just going to give you the intuition here. The feature-based approaches is essentially, so we know that these two domains are probably different and they are from the two different feature spaces. Now, can you define a new space where if the source domain instances are projected, target domain instances are also projected and their distances are reduced? Ideally, you'd like to find a space where if you project them, their distance is minimized. Now, by what is happening by doing that? Now, in this space of source domain instances which are labeled by the way, if I learn a classifier, that will be very similar, that will be very relevant for the target domain as well because in this space, the target domain instances are also very similar to the source domain, right? And now you give back, bring back the labels and associate with the target domain instances. So that's again, it's the intuition for the feature-based approaches. And so on and so forth. There are these parameter-based approaches, relational-based approaches which have been developed in this domain. Now, it has been, as I was saying, that it has been a hot topic in research in the machine learning, artificial intelligence, computational linguistics, computer vision, all these scientific conferences and there are tutorials, workshops, papers, all these other things. But extremely rich literature, and that was our observation, that extremely rich literature but much less applications in practice. And that is the sweet spot we wanted to hit, to see how the vast amount of literature that has been developed, how those can be applied as it is or maybe by innovating on them. And secondly, what are the new questions we need to answer to apply them into the real practice or real life? And that is where I will move to the second part of the talk. Maybe I can pause for a briefly and just see that if there is any pressing question at this point. The organizers maybe may not like it because they have said that not to ask for questions. Okay, cool, okay. So anyway, so let me proceed on that. So the problem that we applied this thing, this transfer learning is for social media analytics. I'm sure everybody, all of us know that social media analytics, it's like we have to do this. It's just so much of content and so much of data is available if we cannot mine insights, it's a waste, right? So I work for this, I work for Xerox. So Xerox has this product called M-Path, which is a social media analytics product, which again, it has some unique differentiating features, but at the same time, I mean, it has done some of the very common things in terms of this categorization of sentiment and topic categorization, identification of geolocation, event detections and all these kinds of things. So we are going to focus on the sentiment categorization and topic categorization, these are the two tasks. And as I have been building it up till now, so transfer learning is applied for the supervised learning or the classification or categorization problem. And that is where this topic is also relevant for the real life example. Now, so this is a typical workflow, it's a little bit busy slide, I will not get into a whole lot of details, I don't know if it is visible. So basically there are four steps to the process of this product M-Path, as we call it, it's listen, analyze, engage and measure. I'm just at high level, I'm just telling you that listen is essentially collecting or crawling the data, analyze is the task of labeling the data, building the model and validating those and the usual supervised learning workflow, engage is essentially applying those models and see them how they are working in real life and measure is essentially kind of measure the impact, ROI, business impact and things like that. The module that is of interest to us or that we wanted to hit is this particular analyze module. As you can see that this is the part where every time expensive and time consuming human annotation has to come. And now here the notion of domain is again very important, right? Any collection that we crawl from the, from say Twitter becomes one domain. If fifth elephant, there has been a lot of tweets are going on around fifth elephant. If we want to collect that, what are you guys saying about fifth elephant? Are you saying positive, negative or are you guys neutral? I collect all the fifth elephant related tweets, very easy to do and then I want to build that model. Now this is something I have to do it really quick, I have to do it today and give it to Zaina saying that please this is feedback, especially the negative ones and see if you can address it by tomorrow. If I give it to her on Monday, I mean I'm sure she won't mind but it will be of less use, right? Because I can't address, I mean maybe in the sixth edition of fifth elephant we can address that. So we have to do this really fast and that's why you can't have these people labeling 200, 500 tweets sitting there and doing this building model and things like that. That's the first, the second reason that I mentioned about this fast data evolving. So retraining model on scratch on new labeled data is not a feasible solution in this case. So we propose this transfer learning methodology and let me just quickly skip this guy which is essentially going to do these two things minimize human annotation and reuse previous knowledge. We are again the key point in this slide is that we are hitting the analyze step which is the most time consuming step. As I said, this is a text classification task. I mean after understanding the broader business context when I work with my colleagues and I essentially convert this into one of the technical problem, one of the scientific problem, right? And the scientific problem here is the text classification. So we have to build a, we have a domain, we have a lot of labeled data from different collections. We have to build a model and apply it for all the fifth elephant tweet which is my domain too for which I don't have any labeled data. The challenges are that if I do it directly and as I showed you in the sentiment classification example in the beginning, there'll be significant performance drop and hence the models has to be trained every time from scratch. Perfectly, we have done our job converting from the real business problem to a converted into a transfer learning problem, right? Now at this point in time, we could have gone ahead and we did indeed also the, we applied one of the instance based, one of the feature space based algorithm and see that and saw the, what is the, what is the outcome that we are getting? Now by doing a lot of experimentation and brainstorming, we realize that there are two key challenges that we still need to address which would make them practically feasible in real life. One is while in the literature, we say that okay, there's a source domain and there's a target domain, right? In this case, we had hundreds of collection which were already labeled and tagged by people. We have to understand that which domain to select and from where to transfer for the new collection, right? And that is where we say we introduce this notion of the similarity aware transfer learning. Transfer from a collection which is similar to the target collection. If you are trying to label the fifth elephant collection, you try to label it using another, another maybe a big data conference related tweets, right? And not from a, not from a medical insurance tweets. The other one, and again this is something as a part of the algorithm I will explain in the next slide also that most of the algorithm in the literature have been like this one short transfer. You do all this feature space mapping shopping things and just just transfer and project and transfer and be done with it. Here we mentioned that and here we actually essentially did this in a manner what we called as the iterative transfer or similarity of a slow transfer in multiple iteration depending on how well we are doing or how confident we are doing as we are progressing, right? Now as, as, as computer science people we just love flow charts. So let me just have this flow chart where hopefully a little bit more clarifying as compared to what I said in the last slide. So essentially these are the main steps that are there in our proposed technique to apply transfer learning for social media analytics. So first step is the similarity analysis where we are essentially doing that, okay? How similar is the domain from one domain to another? The next three steps is adaptation, model building and result analysis is the key transfer learning algorithm. And for the algorithm I would refer you to the paper that was mentioned in the previous slide where, which is essentially this iterative transfer based on the similarity as well as the confidence I am getting. The result analysis leads to these two questions that is the transfer satisfactory already or not? If not, then what we do is that, and this is again another innovation on this I would say that little cherry on the top is that I bringing in this notion of active learning into this framework. So while we have assumed that there is no label data available on the target domain, at this point in time we say that well the, what the adaptation that we have done is not satisfactory for you because you have not given any label data. Can you label like 20, 25 samples for me? And I will try to see that what is the best I can do here. Now instead of selecting those 20, 25 samples at random from the collection, we apply active learning. And active learning and again many of you may be knowing is essentially identifying the instances which would help you to build a classifier better from a big collection of unlabeled data. And then we kind of attach that small amount of label data to the algorithm and that is the loop is iterating till we get a satisfactory answer here. Once it is done, you save model, you apply and you be done with it, right? Let me show you a quick video. Sorry, I should have opened it. So this is the mute videos, I'll give a commentary here. So basically what we are doing here in this case, this is the social media product in M-path where the transfer learning has been embedded. I'm sorry, thank you. Where transfer learning has been embedded. So these are the different collections that are there already in the crawl mode. So we have collected this particular collection and say that we would like to build a classification for this collection and we don't have any label data. So what we go there and add this collection to our system. And then on the right side, what you see is the relevant source collections from the corpus which are already labeled has been identified and this is based on the similarity between the content and we don't use any labeled information at this point in time. And a person who is user can actually go and see some of the comments which have been crawled for this particular collection, the one that I would like to, the target domain collection, I would like to categorize. And then the person among the selected source domain, we could have selected automatically the first one which is the most similar but we kind of give them this option to the user to select one of them as the one to transfer from. And again you can, the analyst can actually go and look into the source domain comments and to see that whether they are good, right? And then the adopt and test button is actually initiating that iterative module where the source domain collection is being used to train a model for the target domain. And now what you see on the right side is that essentially these many instances have been classified as positive, negative, neutral from the target collection, right? Now we cannot compute an accuracy because there is no labeled data at this point in time. So we just ask the analysts or whoever is actually using it that this is the first round of data that we have given. Based on that, you can just go through some of these comments which have been categorized and tell us that whether this is satisfactory or not. And if this is satisfactory, not satisfactory, we go to this add data module, which is the active learning, which is the step that I said that can you give me 2025 samples to label. And once that part is there, and this is the part where the person is essentially labeling the data as positive, negative, neutral on the right side and this is a manual step. But this is an optional step only if they believe that it is not good enough. It is not satisfactory at this point in time. So we annotate all the selected comments in this round and then we go to the iteration of adopt and test. And this is something can go on till the time they person feel that it is satisfactory or not satisfactory. And now as you can see that that some of the data have been tagged, so we can also compute what is the accuracy of these, right? Otherwise, there is no way you can compute the accuracy because there is no label it. Of course, if you had a held out data set, you can still compute a validation accuracy or something like that. But in a real life setting, when a person who is doing social media analytics is actually using it can just go through the comments and essentially identify that how well this is doing. So now if I go back to the slides and again, so there is a bunch of results slide. I'll probably just go over only one and so essentially, so this is a slide that is showing that where we had a test collection on Apple iPhone 6 and this is just as an experimental details. Apple iPhone 6, we didn't have any label data and we selected the source based on the similarity analysis from a collection which is Huawei and again, they are related domain. But the slide that is kind of essentially pointing out is that the deep red bar, which is indicating the accuracy with the transfer learning incorporated and the light pink one is the one that is without a accuracy incorporated. The first bar where, which is highlighted in green is essentially showing that how much of accuracy you can get even if you do not have any label data from the target domain. And if this result, if this number is good enough for you, you are done with it. You are, you have just crawled a collection and you have built a sentiment classifier automatically directly from based on an existing source collection. And similarly results you obtain for again, I mean a bunch of data sets, bunch of other collection from the social media that we used. In summary, we essentially kind of observe that on an average, we could outperform the only supervised model with using only one-fifth of the total labeled instances, right? So which is like a saving of almost 80% in terms of the manual accuracy. But more than that, it makes the system automatic completely as you are going forward. Now an important part of deploying such a technique in practice is that these accuracy numbers may not mean much by the actual practitioner, right? So they are going to actually try this system out. So we had to run something called an analyst trial by a bunch of people who used to do this labeling earlier, right? And told them that why don't you try out this tool and give us user feedback or rating based on that how good you think they are doing, right? Now at the end of that, their feedback was that usability is like five one five reduction in effort is four on five. I mean we got pretty good result in terms of, in terms of the, not only in the scientific accuracy numbers, but at the same time we got a lot of, a lot of good feedback from people, right? In terms of who are actually using this thing. Now so this is, this is the system aspect and the analyst trial results and some of these things we kind of, we have a paper in this North American Association of Computational Linguistic Conference where we showed that how to actually, again, I mean the first paper was more about the technique, the second paper was more about the trial and how it worked in practice with the, with a large, large amount of practical data sets and everything. Now let me just move on to, okay. Again so this is, this is another application in the, in the, in the, in the domain which is on the, on more on the conversation labeling task and there the task was this engagement labeling or identifying that a lot of conversations these has happened over the social media channel in terms of the customer support related things like a query or a question is being asked and the, and the agents are monitoring these, these channels and they have, they are supposed to address and at the end of it these, these queries need to be either marked as like open, closed or solved and things of that nature and again this can be modeled as a supervised classification problem, multiple, a fine grained model is kind of also is, is possible which we called as the, the conversation labeling task but again essentially if you, if you just model this as a supervised classification problem the, the, you cannot just again the same, same issue of the domain, domain divergence comes in, in practice, right? So without getting into the much detail and again so the, we, we, we use the conversation analysis using sample data from some of the telecom operators and, and the financial system. These are very, very small data but essentially this is the part that actually helped us to communicate the benefit of the transfer learning even before we did the social media analytics and again so here also we see that a number of significant benefit in terms of using the transfer learning instead of using the supervised model as it is directly into the, into this, into, into the system. Sorry I have to rush this part just to make sure that I, I give enough time for the question answers and stuff. Let me come to the final part on the, on the couple of talks or couple of, couple of thoughts on that, okay, what are the further research that happened beyond the practical deployment of, of the, of the transfer learning technique that I mentioned. So one of the common problem that we realize that when we were applying transfer learning on the social media data sets that in the literature mostly this transfer happens from single source to a single target but in practice we many times have this multiple source domains available, right. Now for example in the social media context we had like hundreds of collections already labeled. Now given a new collection why do I have to always choose the best most similar even though we are doing in a similarity aware manner. Why do I have to learn from only from the, only from the source domain directly, right. I can learn from multiple source domain effectively perhaps by extending the similarity aware iterative algorithm and that is where we introduce this, this notion of the similarity and complementary properties of, of domains to identify the best K domain from a learning algorithm the transfer learning algorithm can learn and apply to a target domain. And again in the, in the, so we, this is more of a theoretical work that where we provided the theoretical justification for the source selection procedure and also give like kind of guarantees on mistake bounds for the MSID algorithm that it cannot get worse than a lower bound and things of that nature. And finally, and this is a, this is a work that will be presented in a couple of weeks to the, in this ACL conference is this, all the, all the transfer learning examples I gave are all examples of the task where the number of labels are same between the source and in the target domain, right. If the source and target domain labels are not same and the tasks are essentially varying how do we, can we, can we apply cross domain adaptation with disparate labels and that is the main focus of this, of this paper where we kind of again model this problem as a, as an optimization problem or maximum coverage and propose a greedy approach to show that how you can select new set of labels for the target domain even if you do not have any idea that what are the labels are going to be just based on the number of labels. Now finally concluding, so what we covered here, again so the machine learning based automation and the need for transfer learnings. I hope I motivated that the need for transfer learning why we need that in practical setting. I gave a high level introduction of transfer learning and if you guys are interested I'll be very happy to give you further pointers. Some of, some of them are, I have mentioned in the slide. Showed one example, transfer learning in practice for social media analytics. We are doing multiple other things for, in very, very different problems to show that how pervasive this could be. And I also talked about some of the ongoing research direction with this multi-source adaptation and the, with disparate label sets. So with that I'll stop here and thank you. Okay, we can begin with the Q&A. If you have any questions raise your hands. Hello. So one question is I think the use cases that you have mentioned is, you know more on the social analytics and you know this one. When you need a more accuracy, you know like on that. So how do you think what is the, what is the recommendation? Can you give me an example? What do you mean a more accuracy? I mean, healthcare if I really, you know like looking at the patients and then trying to identify whether you know it is a genetic or not. So do you think that it's really going to be? That is where the, absolutely that's a good point. And that is where we introduce this notion of the satisfactoriness or not. Because you just cannot apply your transfer learning algorithm and expect that the performance, it's performing good. It may perform good, but you need to get a validation step to know that, okay, whether this is satisfactory or not. Yes, in healthcare the level of accuracy, specificity and sensitivity, is required to be much higher. But again, with the manual intervention in place, we think we can apply that. So what do you mention is the manual intervention maybe more in those cases where you need more accuracy, probably than the. I mean, manual intervention is a tool. You may apply a rigid manual intervention when you are very, very specific about the particular outcome. Maybe in a social media analytics, you can be less rigid about it. Okay, thank you. Hello. Hello. What do you mean? Hey, what I could think that the cracks of this whole approach will be to identify what is that thing which is invariant from one domain to other and across time. What is the systematic way of extracting that thing which is going to be invariant so that I can, that what part of it basically? Sure. Sure. I just want to have some intuitive feel of it. Right, and that is where the, indeed the invariance is the cracks of it. That you want to identify that what you can transfer because that is what is invariant across these two domains. And the way of doing it is these instance-based approaches and feature-based mapping-based approaches. So essentially, you want to map to a different feature space where these two domains are invariant. You will not be able to distinguish between these two domains. And if you can identify such a feature space, so that becomes like in the... Oh, absolutely. I mean, there are, there are, there are absolutely many techniques are there. I did not get into that. I mean, for example, even canonical correlation analysis is a way you can identify that what is a, what is that invariant space? There is a very famous technique called structural correspondence learning which identifies that what is invariant between this. And let me give you an intuitive example. In the sentiment classification example, there are certain words which are invariant across domains. They are positive or negative irrespective of the domains. Excellent, best, good. These are always positive irrespective of the domain. Well, there you have to identify. So these, these things are called as pivots or invariant pivots, right? So once you identify such invariants, which may be different for a different language, I'm just telling you the example where the feature distributions are different, right? In that, with respect to those features, you can actually then identify that, okay, what are the other features that are going to be positive or negative? Okay. Hi, my name is Uday. As you showed, social media, analytic tools. So my, basically, two question one, while considering the model, did you take on this natural language processing? If yes, how much was the weightage? Just let me make sure that I understood the question. So you're asking that, how did we decide on the weight of the instances? Yeah, in social media, generally, the guys write, I mean, their language is very difficult to identify. So generally, we use this natural language, a language process to identify the whole meaning of sentence. So did you consider this one? While doing the modeling? I mean, this linguistic pre-processing is needed, irrespective of whether you use transfer learning or not, whether to how to remove the noise, how to canonicalize them, how to tokenize them, how to make sense of smileys and things of those nature. So those I am not even getting into the, those are needed. Once you have done that, and that you have to do for every domain, once you have done that, then how do you identify that, okay, what is the source domain instance, which is more similar to a target domain, right? And that is where this algorithm and you can, yeah, refer to the paper to see more details, how we are doing the weighing and things like that. Thank you. When you say we do our transfer learning and check if it is satisfactory or not, right? So how do you identify that? Because we have the new data unlabeled, it's not labeled. So how do you identify that? Can you? Good catch. We assume there is a validation data set. Once we have a validation data set, then we can do a satisfactory identification. In case we don't have that, the analyst has to manually or whoever is actually analyzing the data has to manually go through, as you saw in the demo, is actually going through some examples and saying that whether this looks good or not. Now how much you can do that, that depends on how much of manpower you have. So I mean, once we have our results ready with the transfer data, we randomly pick something and analyst analyzes and then so basically again one more validation has to happen from manual intervention is there right there as well? Well, I mean, that's an optional step. If you do not want to do that, I can do that one short transfer based on similarity aware or all the things that we are doing and then you can give you the prediction. But what we find in practical setting, people are okay to validate certain prediction and again, mind it, validation is less time consuming than actually assigning the labels. You are checking whether something is correct or not versus you have to categorize into five classes. So that way the validation is helpful. I mean, I take your point, yes, the manual intervention is needed, but firstly it is optional. Secondly, it is less expensive with respect to time and effort. Hi, Shwarya, here. Yeah, great presentation there. I wanted to ask you, is there a possibility of extending this beyond classification problems? Yes, yes, there is, like any specific question. I mean, for example, you can always justify, or do you want to continue with the question? No, no, sure, go ahead. Well, regression is a definite possibility that you can actually apply and there has been some work that it has been done, but only in the supervised setting it makes sense. It won't make sense in an unsupervised setting because the labeling effort is not there. In fact, we are currently working, I mean, it's premature that's why I didn't mention that transfer aware deep networks. I mean, can you actually train networks which are transfer aware? Again, I mean, deep learning is awesome, but still you need level data, you need plenty of level data, right? Can you make them transfer aware? That's something we're working, so I am not going to talk into that, but yes, that's possible. And any supervised setting you can actually put in, regression, ordinal regression, classification, anything, you can apply transfer. The philosophy is very, very much there. Thank you. Hi. Here. Yeah, it's a follow-up question to what he asked. So assume I am satisfactory with my validation set, like the learning on the target domain I'm happy with. So does empath relearn on this set, you already learned something on the source domain, that model is there. So now consider the validation set as a new label set I am passing. So is there any feedback going there and update the model? So in future, if I predict my scores would be... It's possible, we are not doing that. Okay, so empath doesn't do that. Thanks. Hello. Good question, thanks. Thanks for the great presentation. So just a continuation to that thing scenario suggested. Let's say the validation fails, what happens next? Of course you ask them to label it. So how much labeling is enough? Oh, that's one of those magic machine learning parameters, right? So we saw that like one-tenth of the initial training data that people have used for source domain is a good number. But you can start off with very small like, I mean, here we are talking about training data size of few hundreds. So you can start off with like a couple of tens, 20 or 30 examples and see that. And given that this is an iterative procedure, so you don't have to kind of give a whole lot of examples at the very beginning itself. You can try out with a small and see what's the change. And, but yeah, there is no specific number that I can not suggest. But it's one-tenth is a ballpark that we don't go with. So when do you actually say that the model is again ready to try it to them? Oh, after the analyst can give any one example and say that adopt and test. Okay. And then check that whether it is satisfactory or not. It is, we give as a recommendation that you give 25 examples, for example, and then train it and then see the performance. Don't give very less because it won't be good. And our contribution was to select those 25 examples. And that is where active learning is something we used. All right, all right, okay. Thank you. Hello. It's here. Yeah, yeah. I have a question that is when you tag the tweets, we can't say a tweet is constrained to a single topic. If we consider, yeah, iPhone fires, tweet is a tweet about iPhone fires. They may say that for camera it's a best quality, but they may say that the battery is raining. So right, so we can't tag that directly because it's a positive thing or a negative thing. So this is very good question. This is a subclass of sentiment categorization problem called aspect-based sentiment. It's something, again, a very well researched topic which we did not look into here. The way I would like to kind of brush away this thing is that that's also I can model this as a classification problem. And instead of these positive negative classifier, I can build in an aspect-oriented sentiment classifier and to address that. But yes, that's a good point. It would be interesting to see that whether transfer learning works under those fine grins, its scenarios as well. We didn't do that. So that is the iterative. When I say a slow, I use the word in kind of in lieu of the iterative. So iterative has in that you can gradually feed in some examples to improve the model over a period of time. It's, yeah. So actually what you're doing is a one-shot transfer plus active learning which you're terming as slow. Slow iterative transfer. Well, with the active learning examples, I'm still again doing transfer learning. Transfer learning can happen in presence of no-label data or in presence of small amount of level data, as I introduced in the right. So when the active learning-based samples are given, I'm not just doing, I'm not learning the classifier only based on the examples that are given as part of the active learning. I'm learning from the source domain collection and the small amount of label data to obtain a better classifier. And that is why it is right to call as iterative transfer. You have also seen this kind of phenomena happening with small data. When you have very less number of training items, so you go ahead using the domain kind of data. How is this different from the others? I didn't understand the question. I mean, what do you mean by? For example, if I have very less amount of training data, instead of training the classifier with that data itself, I go ahead and annotate, I go ahead and supplement that data. You said, you said it, you annotate. After you say it doesn't, I don't label them, rather I would bring in another readily available domain data and try to classify it. Won't that fall in like very similar to active learning where you learn in presence of very small amount of data and large amount of unlabeled? I don't, actually I try to train the model using both the data sets. The small data plus a similar domain data. I mean, I would say that that is more close to, closer to the semi-supervised learning scenario where we have less amount of labeled data and lot of unlabeled data. The variation that you are bringing in is that this unlabeled data is not from the same domain. If you are doing something for the fact that this unlabeled data is from a different domain, then these two are similar. But if you are just bringing in a similar domain, another domain's data as, because it is similar, it's a, I would call that that's a semi-supervised learning unless you are doing something for that. For example, this feature space mapping that we do to ensure that these two, this classified training is happening on a space where this new domain's data is kind of similar to the source domain data. If you do something like that, then it would be transfer learning, otherwise it would more like semi-supervised for me. In case I answer the question. Okay, I think we'll need to intervene here. Sure. I don't have time, thank you very much. You can take it offline.