 How many of you know about Yodly, by the way? Show of hands? Not bad. OK. Oh, of course you guys know. So I have a whole bunch of team members here, and yeah. OK. So I'm not going to start for another couple of minutes, but this is just to warm up. So my vocal cords, you know, get warmed up. And we have a good conversation. I have 45 minutes for this talk. But what I typically like to do is have it a two-way conversation. I mean the content that I have, you know, I was preparing it till a minute ago. And that's about anywhere from 15 minutes to five hours of conversation that we can have around those 45 slides. And so what I would really like for this to be not a one-way channel, especially because we've all had some very good lunch. So I will pause, you know, whenever you want me to pause and, you know, if there are comments that you have, not just questions. If there are comments that you would like to make on, you know, what I'm talking about, it's more than welcome. I know Rohit will have a lot to say. So Rohit, please talk me as well of, you know, similarly my team here from Yordly. If you have any comments, if you think I made a mistake, you know, please call it out. Likewise, if you guys want a particular thing to be covered, you all, I'm guessing, read the abstract. So if you have any expectation of what should be covered, what is less important, please let me know. And then, you know, we can do that also. Yeah. Okay. Is it time? Yeah. Okay. So thank you everybody for showing up. I am Om Deshmukh. I am a senior director. I run data sciences for Yordly. And so what I wanted to do today was talk about this lengthy topic, which essentially says, how do you take a machine learning solution which does a wonderful job in a lab setting and actually deploy it so that somebody pays for it? That's essentially the point of the talk. Yeah. Okay. So before I go into the talk, I thought I'll give you some context of who I am, what you should expect in the next 45 minutes and what sort of a color I'm going to bring to this talk based on my background. So I have a PhD in machine learning as it is applied to speech recognition. This was about 12 years ago. And then I was in a couple of industry research labs where I think I had a good chance of working with wonderful team members, which led to good, you know, good recognition in the research community in terms of research papers, patents, a lot of best paper awards and so on. But then about two and a half years ago, I made a big switch based on this background and then the other thing that got added to my introduction is the following, which says delivered in live production, which means it is live. Any change that you make gets, you know, is visible almost instantaneously. A deep data science solution, which means it's not just rules. It's not just, you know, a simple simplistic model, but it's a deep data science model, which is at scale, which means today, you know, one of the solutions that we have runs on about a few tens of million unstructured text transactions, financial transactions and there are strict, very strict SLAs. So these tens of millions of transactions have to be processed within an hour or two. Otherwise people who are paying may not, you know, may not want to pay. So all of this we've been able to deliver and you know, I'll talk about what were the challenges in moving from this part to this part. So, so that's the that's the focus of the talk. What I also wanted to say was, you know, while I spent about 10 odd years in this field of 10 odd years here and five odd years there, I like to believe that I know maybe, you know, hopefully 1% of machine learning, you know, whatever machine learning exists and maybe less, maybe less than 1%, but the good news is that is enough, actually more than enough to be able to deliver this kind of stuff where you can have at scale real time, highly stringent SLA based solutions, which are, you know, which are truly big data and truly valuable for people to the extent that they will pay for it. So that's sort of one message that I want to want to get across. Okay, any any questions, comments? No, okay. I've also been lucky enough to have in all of my career, so far, wonderful team members. Some of them are sitting here. Please raise your hand. So if there are questions after the talk, you know, you all can and if I'm not available, you can talk to these guys. Okay. Now, as we move from, you know, from developing a POC within a relatively close, relatively secure environment and you deploy it. The first thing you have to worry about are who are my stakeholders, right? And folks who joined late, the comment that I made earlier is, these are my views. If you have not just a question, but a comment where you want to challenge this or you want to support it, please feel free. I want to have this as a, you know, as a conversational event rather than one way where I keep talking. Yeah. Okay. So, so who are the main stakeholders? The first one is, of course, the data science team. And so I'm guessing many of many of you here have had some interaction with the data scientist or are yourself a data scientist. So when a data scientist builds a solution, what is the first thing that he would say? Folks in your league, please don't speak. Any guesses like what is the first instinctive thing that a data scientist would say? No, no, no, no, no. Thank you. Exactly. He's going to say, I am using this cool technology, which gives me, you know, 95% accuracy. And I did that. I mean, in my, in the beginning of my career, I used to be, I used to be highly, you know, boastful about the accuracy that I'm getting about a particular solution, which definitely wasn't 100% but still, right? Now, who is the other stakeholder? The customer facing things, right? And what is, what is their first, what's their first instinct? So they have what I call, I totally came up with this term, Sachin Tendulkar expectation syndrome. What that means is every time Sachin comes to bat, the first ball has to be a six and he should not get out till he hits a century, right? And so they have this kind of a syndrome which says no matter what the solution is, no matter what the expectation is, it has to be 100% accurate. Anything less than 100% is not acceptable at all. So that's the second big stakeholder that you need to manage, which is the third stakeholder in this case. It's the executive team. They always, you know, always worry. Are we missing the bus? There's a new cool technology that has come out. Are we using it? Is there value for, you know, for us to be using that technology and so on, right? And so what we've been doing, how we are managing this is I am, of course, largely focused on this side of a lot of experience here. One thing that has worked wonderfully for us so far at least is helping the data science team get a full picture of what is the problem they're trying to solve? Why is 99% not good enough in some cases? Why is 83% just about enough in some other cases and so on? And it has worked wonders yet giving, you know, giving the data science world a full picture of the problem they're solving really, really helps them to step up as well as for, for a different kind of solution to, to emerge out of this. The second thing is give them freedom to explore cool things. A lot of the times what happens is as you're exploring cool things, a new solution comes out, which is applicable to the problem that business is willing to pay for. So folks who were there in yesterday's talk that some of my team members gave, we were doing, we were talking, we talked about a lot of the work in reinforcement learning. There's a tiny piece of that that we are actually using for our work, but a whole lot of work that is happening there really has only peripheral application to what we're doing. So so that's sort of the cool work that we are, you know, we're bringing in here. Okay. Then on this side, really there is, there is just educate, educate the customer facing teams in terms of how the data science world works, how it is different from a typical software development world offer to collaborate. I'll talk a little bit about that and then educate a little more and keep doing this till, you know, till you hit a, you know, hit a winning strategy. And then of course on the on the exit side to a large extent, it's education as well as talking about the data cycle world. What I mean by that is the following and again, you know, all the images are from Google search. I just knew what to search for. I didn't build the these images. So what we have seen is rich gets richer. And what I wanted to highlight in this in this picture is each of these points on the circle are comparative words. It's more or smarter or better. It doesn't what that means to me at least it wasn't in the in the actual article that had this. What it means to me is you can start in this circle at any point. For example, it doesn't say x users or x accuracy algorithm. You can start at any point and as you go around the circle, you will get better and better and better. Right? Eventually though, what I feel and this is this is my point my point of view after a while as the machine learning systems become you know, lot more stable than what it is today. More data will be the main differentiator and notice that I'm not using a quantitative word in terms of what that more means. It's a qualitative word and I'll talk a little bit about what what data really means and what more really means in this case. Yeah. Okay, and the second one is how do you educate the the illusive 100% right? I mean, Google has been around for many many years. They're undoubtedly the best in terms of search and a lot of other machine learning deep machine learning work that happens. But when you search for Jaguar, let's say you got this. How many of you believe this is correct? Not I was expecting a lot more of you to say it, but okay. It depends exactly, right? So if you show if you are expecting this, then this is the correct one, right? Now maybe you really meant Jaguar and you just got the spelling wrong. Even then you're going to yell at Google that it didn't find the faucet that you're looking for. So so that's sort of, you know why even big mighty groups do not have 100% accuracy, right? So depends on how you define 100% really. The only place where it fails is if you make a mistake like this. If you search for Jaguar and you and you gave a output or give a result, which is this and those are the kind of things that shouldn't happen. If they happen and you get yelled at, yes, you deserve it, but otherwise 100% can, you know, can be not just the exact 100% but something else. Okay. The second point is consumer AI versus Enterprise AI. What I mean by this is when you go to Facebook or any of your social media or, you know, any of the other services which are relatively, which are actually free, your expectations are high. Yes, but the fallout of a wrong output is not as bad. For example, you took your wife out for a movie. The review said it was highly romantic movie. It turned out to be an action film. Okay. I mean, this is not as bad. Yes, you will go hungry for a couple of days, but probably not as bad as what may happen in this case when the stakes are much higher. So there are a couple of examples that I would like to give one. Yes. So there was one example where some machine learning algorithm misfired only for 15 minutes, just 15 minutes of one day in one trading one trading platform and that led to a loss of about $400 million. That's huge. I mean, that's that's really huge. Other example is some image recognition software misfired and a lady who was innocent was stopped held at gunpoint because her her number plate was was misrecognized as of somebody else who was most wanted. And so those are the kind of enterprise AI problems that you need to solve where the customer facing teams really have a lot of pressure when they're talking to the customers and hence the kind of things that work or don't work on the consumer AI side need not necessarily work there. Okay. So then what that means is you need a lot more holistic approach that says you need a end to end solution that a customer is willing to pay for. You have to be crystal clear. What is the exact value that the AI solution is bringing and then be able to call it out saying it'll do this. The rest of it is outside of its purview. That is where your internal stakeholders will be very very appreciative if you're able to call that out. And then of course this piece is and we've had to struggle with this quite a bit not just in the current setup but also you know in some of the research labs where you can't optimize all three things you can't optimize cost can't optimize accuracy and can't optimize functionality. It's like you have to hold if you hold two of them constant. The third one has to move for it to you know for it to be feasible. Okay. So how are decisions typically made how we're made before this before the emergence of big data. Do you know this concept called HIPAA highest paid person's opinion HIPAA sorry. Typically that is what used to happen and for a rightful reason because intuition was largely what would drive decisions and intuition would come as you were you got more and more experience and as you got more and more experience you would you know hopefully get higher and higher paid. So HIPAA was you know was was a reasonably good way of of doing things in the pre data environment right now what happens is the HIPAA person or the person who is you know in the in the position of authority to make a decision should only be able to formulate one or more hypothesis. Yeah and then that person or that group before getting into driving the action has to go through these steps where figure out what is the data that is needed to evaluate this hypothesis. I'm not saying validate but evaluate right. How do we get these data sets whether they are available in the current setup or not what is the time taken to get them and all sorts of questions which I'll dig deeper in a little why the third is what are the ways in which we can now formulate insights based on this data right and then only provide expert interpretation of that data. What that also means. Okay I forgot about this good what that also means is look at the data keep your biases aside for a minute and make a decision the the other the other very important interpretation of this is do not let data or machine learning alone make the decision that has to be human in the loop it's it's very very important. Okay so these are sort of nine steps that that we've come up with this is by no means the final things and by definitely by no means the the correct thing we are working on this so any inputs that you have would be highly appreciated one is of course know exactly what problem to solve I'll talk a little bit about this where is the data what are the limitations what are the ways in which you can scrub the data what is the AML framework that you can use data swimming this is something that we came up with which basically just says you know learn know your data intimately before you can even make any predictions I'll talk a little bit about that represent the data so that it can be used by your machine learning setup trained for generalization I'll talk a little bit a little bit about this optimize for deployment and then know what is the end to end thing what are the touch points with your customer definitely the customer is not interacting with your machine learning solution in its raw form there is some transformation that is happening there is some interface that the user is is interacting with and then of course provide feedback so I'm not going to cover all these nine steps I'm going to cover the AML aspect and then these two aspects swimming is is a very abstract thing that by itself will take about 45 minutes but I'll talk a little bit about that as we as we progress so the question was this problem really is only one line get me more money of course but the problem is the that is the business problem so I have I like to say that a good researcher is somebody who solves a given problem a great researcher is somebody who formulates the right problem and so this is really the the first piece where you need a good team which can take this business problem which is you know increase my revenue either my top line or bottom line formulate that into a functional problem what does that mean do I have to build a UI do I have to increase the time reduce the time that it takes or do I need to get higher accuracy and then formula formulate that into a data-driven problem so at your leave we call this as the discovery phase which means when internal customer comes to us and says solve this problem we say we'll first discover and what does that discovery mean a is the problem solvable from a data perspective or not which means a are their patterns is a human able to solve it do we have enough data so that's one second is do we have the kind of infrastructure that we need to solve that problem or is the infrastructure cost much much more than the kind of money that we would make and then at the end of it if the answer for both of these is unless the answer for both of these is yes discovery phase ends by saying no this problem is not solvable we will not solve it at least not in a data-driven way does that answer your question okay what we also used to do back in our some of our research labs was we would have dreaming sessions with our clients we would sit with them and have them tell us what are their pain points how is it that they're currently solving it and so on that touches upon a little bit of the data swimming part but before getting the data we're trying to figure out you know what is the lay of the land and so on this question yes okay so the question was a lot of the times business teams come to the data science group and says I know exactly what problem to be solved here's the data go solve it right and how do you address this the way you address this is not a technical conversation it's more of a soft skill kind of a conversation where you either have to yeah one of those yes but sorry I shouldn't be making fun of that question it is a very important question but what I'm trying to say is there is no easy there's no one direct way of answering it couple of things that have worked out well for for me know the past 10 odd years is a although I'm a total right take them out have a conversation largely so they trust you so they know that this guy knows what he's talking about or his team knows what they're doing and they'll open up a that's one be is be genuine and ask them probing questions of what will happen when you solve this problem and that has also worked out very well for us where we take the problem but then we asked them genuinely okay why does you know why do you think solving this problem will you know will lead to the customer either upselling or cross selling or buying a new thing and that is sort of worked out for us yeah you might be dealing with say an IT person whereas the real problem is that the business end and the IT person doesn't have access to the you know the full data or the full cooperation at from the business side so they want to kind of show something that's possible to the business side before really taking you the full way that's one thing I've seen the other thing I've seen is maybe they're just trying to give you a small problem as a test problem to see how good you are so I've seen those things and the best way I think to one way to I mean I one way to really address it is to really figure out why try and figure out why you know they're trying to restrict the scope and based on that you know figure out you know how to approach it yeah yeah yes so that is that is definitely a challenge and so you know events like this where not just the data scientists but a lot of business facing teams would come are great places for this kind of awareness to be spread yes absolutely okay so I was going to also skip this or largely gloss over what is the you know what is the ideal data and so on also skip that part and come to the fourth one which is how do you formulate the the framework the AIML framework that you that you want to look at this is really a very difficult problem to solve it's there is no textbook solution to this kind of a problem this will come a through you know extensive training in the area of data sciences so for example basic things like knowing what regularization will work for me what kind of a model will work for me you know which models are applicable for text data which models are applicable for multimodal data and so on that that's it's not something that I that is tangible that I can put my hand on and say you know go follow these five steps and you will get the answer although I'll give you some ideas into this the other problem that we have seen is there are two extremes one extreme is good data science papers which talk about how algorithm works what is the maths behind it but won't necessarily go into which areas it is applicable what are the nuances so for example if you take a KDD paper or an IPS paper and try to implement it you will definitely not get the accuracy that they that they got because there are a lot of things that are not talked about and rightfully because the paper is only expages and you can't talk about all things so I'll cover a little bit of that here the important thing that it has to be worried about in an enterprise big data big data setup is your solution has to be generalizable and it has to be scalable I'll give you some ideas on on how it has worked out for us okay so and this sort of also ties with how you need to now you know manage your stakeholders typically what happens and this is just a schematic of course taken from Google search whatever resource you expended this initial foundational work really is the crucial is the crucial effort because after a while it hits an asymptote and that asymptote really depends on this now it's true about a software development setup also but not as much because in a software development setup once you fix the the API handshake that I'll give you x inputs I'll get y outputs pretty much the work is done then that black box can be used wherever you want and there is no cascading effect if you will but in the machine learning setup that's not the case what you do in the foundational layer so for example somebody give you data you had to fit you know fit a model to it and you thought for whatever reason it's a third order polynomial you fit it you spent a lot of time doing that as you get more and more data no matter how many efforts you put you have reached that asymptote you have to go back critically evaluate whether the third order polynomial was the right thing to do or not and that sort of brings you back to square one but then you the when you hit the asymptote either you hit it much early or you hit it at a much higher value yeah so that's why it's very important that it has irreversible impact on when you hit it and where you hit it so it's very important that the foundational part is given a lot of lot of time a lot of due diligence okay so broadly speaking in in that foundational layer you need to think about two things one is the subjectivity which which will come into your models and the second is the objectivity and what are the ways in which you can control those again there is no deep data science in this but these are things that typically get you know not not discussed as much so the the main objectivity that comes into your machine learning model is the cost function how are you penalizing the loss function or whatever how are you penalizing your machine right and there are a variety of ways and you know variety of cost functions that you can use I have just given three very simple ones a linear function so all the x axis is how what is the deviation from the expected output and the y axis is for that deviation how much are you penalizing the machine so the linear function stepwise function and then some sort of a piecewise linear or mixed function now what does linear say any small deviation from this point to this point gets proportionately penalized so the machine is now being made to learn that everything has to be exactly the same in the output as what I saw in the input that's not always a good thing because sometimes you want to have some leeway so the machine can learn some generalizable solutions right so any small deviation is penalized stepwise functions typically would be that rounding off errors are okay and if that's okay with your problem if your business unit is okay with that your customers are okay with that then maybe that is the function that you should be following right again these are all all sort of obvious things but you know in the in the rush of the moment a lot of the times when you're looking at some cool algorithm these things take a back seat and then they come back to you later the third is of course the mixed function if you are tolerant to a particular limit and if the business says that are you yourself believe from the domain experts that you have that that's okay then maybe that's the function that you want to use or the reverse is also possible that you use a function like this and then you realize that a lot of the you know the lot of the negative publicity you got was in this region that you thought was tolerant and it's not it's time to go change your loss function okay that's sort of the one aspect of objectivity the second is the subjectivity piece the machine is only as accurate as the data that is fed to it and it can generalize yes but still some of the things that it learns are based on what what is being fed to the system and and so in this case if the machine has to distinguish between the red and green it's easy for humans who are giving the label data for the system to provide highly accurate data right and so that in a typical supervised setup you would have you know some human or some expert giving you some inputs like this the kind of things that you would want to do a repeatability and reproducibility what I mean by repeatability is the same task is given to the same human expert at two different times and then you see how much the how much is the agreement for that same task same human but across two different times that tells you about the complexity and the subjectivity of the task reproducibility is the same task is now given to two different experts who are in the same setup so there is no external you know ambient variations that are coming in same setup two different human experts same problem how many times do they agree that is what is reproducibility that gives you upper limit for what to expect from the training data that you are getting that will also tell you how much generalizability you want to bring in one way of doing that is changing your regularizer and so on so those are the kind of things that you need to worry about as the task gets complex you know like saying whether it's a navy blue color or dark blue or violet humans themselves are going to be you know a little more confused than in the previous case and hence it's reasonable to expect for a machine to be also you know a lot more confused than when the task is simple so it will be very particular about these things what that also means is that when you are posing the ML problem to get the you know get the data be through your in-house in-house teams which do the the labeling for you or through any of the crowdsource platforms you have to be crystal clear about the kind of problem that you are posing in literature it's also called priming how you prime the user to provide feedback you know is has sorry yes that has a lot of influence on the kind of label that you would get okay now comes the domain expert and flexibility piece let's say I gave you only this data some x-axis some y-axis and I you know gave you this data points and I said you fit a model to it reasonable person would say okay I'll do a linear model which is reasonable thing to do outcomes razor whatever is the simplest let's just use that and in this case this happens now let's say a human domain expert comes in and says wait a minute the x-axis is really the magnetic field intensity the y-axis is the amount of magnetization that you have and so now would you do a linear probably not because now the human expert has told you that there is going to be some saturation that is going to happen you haven't seen the data yet but tomorrow or you know soon you will see that data and now because of that human expert feedback that you've gotten instantly you would now say okay I'll do a piecewise linear model right so that's where a lot of synergy has to come in from the human experts we have seen a lot of the times data scientists go off to the extreme and say we don't know we don't need any human experts there is this cool latest paper that has been published given 99% accuracy I'm just going to blindly use it oftentimes it will fail yeah now you know extending this magnetic theory data a lot of the times what you would see is that data may completely shift in the n dimensional space that you're looking at where a human expert either is a little late to come and tell you or human the human expert himself or herself wasn't sure how the data is going to how it's going to change in this in this whole dynamic world so there what you need is flexibility in the model because as you see more data your model has to be flexible so that it can is systematically incorporate the new data that has been seen and somebody just before the talk began was asking me about what I thought was transfer learning kind of a problem where the data that you saw for training wasn't necessarily the kind of data that you are getting in testing a lot of flexibility that you would get from model building the kind of model that you would use will provide for the generalizability and I'll talk a little bit about some of the work you're doing in that area so so that flexibility problem let me motivate it slightly differently also how many clusters do you see here typically to write okay as you add more data I don't know if you notice that but I added a little more data and you still see this to remember you fixed K to two right okay I at least saw people nodding their heads for two now what happens now there was more data which sort of came here came here now at least half of the people who earlier said K was equal to two was was absolutely right or probably if not outright saying no it's not to at least are in the doubtful region that K may not be to in this case right now what would a typical machine learning person do okay I'll change from K equal to two to K equal to five or four and rerun the entire thing now that's okay if you have million samples if you have millions of samples and you know somebody is charging you for the kind of hardware that you're going to use you will definitely think twice right and that's where you now need to go back and see how humans do it and what are the kind of machine learning techniques that can be done so a kid sees this that's the first vehicle he see here she sees sees there are three tires but knows it's a vehicle gradually sees there are two two wheels and still it's a vehicle same category as this in in in his or her mind has formed two clusters sees this doesn't say this is not a vehicle knows it's still a vehicle but a third cluster of vehicle then sees this eventually or not after a while and then now doesn't say that this is not a vehicle says okay you know I initially thought one cluster two cluster three cluster four very systematically a human has learned this right they can self teach flexibility of parameters there is a machine learning framework which is called Bayesian non-parametric which helps you 10 minutes yeah thank you there is a setup called Bayesian non-parametric actually non-parametric is a misnomer it is just going from a rigid set of parameters to hyper parameters which then let you you know play around with the kind of values that parameters can take some of the popular techniques there are GPS CRP and IVP the names are very odd but those are some really cool things that you can do if you don't know the number of clusters that that your data is going to have CRP is the technique to use if you don't know the kind of correlation that your two data points are going to have GPS the technique to use if you don't know the number of dimensions that a particular data set is going to be separable at then IVP is the technique to use okay so so I so people know about PCAs and SVDs and all of those in in those techniques you have to know beforehand how many dimensions you're going to overlay your data in right now what happens if the dimensions which are which are discriminating will change or will increase typically increase as the amount of data increases you would you don't have a way of knowing what that dimension is beforehand right to decide what is the dimensionality at which you're going to project your data and so that is where IVPs would come in which would allow you to do countably infinite so both both words are very important countable as well as infinite it it lets you go up to countably infinite number of dimensions and similarly in CRPs it lets you go up to countably infinite number of clusters because there are ways in which it's still bounded it definitely never reach even twice the order of magnitude that you started with okay so so and then you know we we do quite a bit of work in CRP and GP to to adjust for dynamic nature of data okay generalized ability so I'm going to skip the other steps sorry I'm not covering data data swimming generalization and optimized deployment yeah okay so folks in my team know that no talk that I own gives is complete without power law figure so when you handcraft rules they are sufficient for you know the the head of the data the most commonly occurring data because they're only tangible or you know countable number of samples you know you can perform rules and fine but the real value the real the real money is here where you want to have generalizability so that you can get to the long tail as quickly and as extensively as you can and that's the problem called exploring the long tail how efficiently can you Netflix has this and everybody everybody who has the big data has this problem YouTube has it Google has it how do you recommend a YouTube video that nobody has seen you don't even know he he or she will like it or not but at the same time you don't want to go for too much that's the problem of exploring the long tail and that's really the generalized ability aspect there are ways in which you can do this again in our setup we we've come up with some ways of doing this now the computational efficiency piece Netflix gave a million dollar award for you know the group that came up with the best way of recommending DVDs right turns out they did not use that solution this is not the only reason actually to be fair this was one of the reasons but a very important reason that the engineering costs were just prohibitive the other reason was they moved from DVDs to streaming and so on but still this highlights the kind of problem that you would have that the solution is wonderful data science has done an excellent job but it's not deployable because it's cost prohibitive and that's where again what we've done is figured out ways in which you can balance the complexity of the data of the model with the generalized ability or the spread of the data that you have right so what I mean by that is in hindsight it is very very obvious you have data for prediction instead of calling the machine learning model every time the data comes in if there is a power law like this and law of large numbers says that there will always be a power law why not do this if it is not seen before only then call the machine learning algorithm if it is seen or even the ones the first time it is seen make a decision whether it should be added into a relatively cheap lookup database and there are some way you know different ways in which you can formulate this problem if you add it next time the data comes if it is present in your database it's an operation of lookup now of course you're paying for the cost of RAM you're paying for the cost of you know RAM or any other kind of memory that you want to do but that is the kind of engineering big data engineering evaluation critical evaluation that has to happen when you want to make the solution feasible it also helps you a lot of the times in meeting your SLAs because over an operation you can increase the you know the RAM size relatively easily and you know get the lookups done quickly rather than having any complex algorithm be called over and over again okay so that's again this these are the kind of things that don't typically get talked in any paper yes yes good point so the question was or the comment was what happens when the context changes absolutely that was the ninth block that I had in the in the steps feedback loop this assumes that the that the context hasn't changed the machine learning model hasn't changed if the context changes but if the machine learning model hasn't learnt a new thing then the output is still going to remain the same so when the context changes you need a feedback loop that says the context is changed model has to be retrained and then the lookup table has to be changed absolutely yes okay now five minutes yeah okay so now I'll talk about the UI UX part you have to know that this is just a part of the whole system the solution that you're working on is just a part it's not of course not the whole thing a lot of times it's a tiny piece and so you what you need is the UI UX to be such that it minimizes the the exposure of your weak points and big guys do this very well right I'll give you an example you search for Prime Minister of India on Google Google is highly confident about it so it gave you the answer right away right now I searched for where do the MPs of India meet we all know it's the parliament or relatively you know yeah but of course Google wasn't very accurate or very confident and so it said okay you know I'm not going to give the one answer and sort of create a foot in the mouth situation I'll give all the possible answers or you know I'll just give 100,000 different searches and how many do we have here yeah we have quite a few and of course they are ranked nicely and the first one is the correct answer and so on but what I'm trying to highlight is that the interface accounts for the confidence that the machine learning system has this also works out wonderfully in the spelling correction spell correct kind of a software if it's a word like T H then it automatically corrects to T H E I won't even ask you if it's something like this and I created this this morning and you know on my laptop visit place it has a reasonable confidence that it is visit slash place so it recommends that now I do this it has no confidence and so instead of creating a bad impression it says I give up I'm not going to make any suggestions right so that's where the UI UX has very wonderfully gelled with the machine learning algorithm the end goal that had that was in mind yeah okay okay now a sort of self advertising what are we doing with with this kind of work right I'm going to show you some examples of the descriptions that would come up when you swipe your credit card or you do any kind of a financial transaction this says you purchase something in Chicago in noise in a Walmart this says you went to Walmart again W M this says you went did something in PayPal this says you used PayPal to pay to a merchant called Park mobile this is slightly interesting although it may look like you know you went to Walmart for Walmart shopping but no you went to Walmart to really go to the H N R H N R block which is a tax you know tax related service. This is Hulu merchant now again if you had only a keyword which was H U L U it would misfire here because there is a small grocery which is called Chihuahua and so just H U L U kind of a keyword based thing will misfire and so these are all real problems and so on. The kind of work that I talked about we've applied it to millions of these transactions with two goals largely two goals in mind one is predict the merchant second is to predict that this transaction is related to tax category of tax this is category of transportation because it's parking this is related to general merchandise because Walmart gives you general merchant general things that you would like to buy and so now with the baseline which is also deep learning based baseline the accuracy for categorization was about eighty four eighty eight percent it went to about ninety three percent we got the scale in the merchants initially there were only about two thousand odd merchants in the baseline setup and with some of the nine steps that I talked about we are able to get to three million merchants so our setup can now take any such transaction and predict a merchant if it is one of the three million merchants that we have right and hence the population has gone up but at the same time and that's the beauty of the system that the population went up but the accuracy for all of those merchants has also gone up from seventy six to ninety two so it's it's two things like the population goes up and on that bigger population the accuracy has also okay so that's what I had okay thank you so the question was would the transaction while it has the name of the merchant would it have the category also master card and visa card sometimes we have the MCC code which which is a proxy for largely a proxy for the categories by and large we don't have it so we still have to predict does that answer your question MCC yeah not always okay I think we can only take a take one or two more questions yeah right so when it if it is not seen yes by definition and by design it will not go in the lookup but what is so let's say let's take example of face recognition let's say there are some it's a it's a train station which has employees coming in and it has strangers coming in and so you have a face recognition software there now you know that there are thousand employees who come in every day you probably do not want them to be recognized for the face recognition software every time they walk in you can probably create a lookup for those at some parameterized level so that when those parameters meet it becomes a lookup but then for the strangers which is the long tail you will have millions of people coming in on a monthly or an annual basis there the lookup won't help so that's sort of the decisioning that you need to this is probably the last question I guess yes yeah generally we have issues with deep learning that we cannot explain the model it's a black box model right so in case when you have to explain it to the client let's say you want to explain to them why this has happened like one thing which have I have we have tried before was to use a deep learning network to create feature engineering and then use the feature engineering to a tree model and do the predictions so that you can explain some part of both you get both of both so in your case if you have to do something like this where you have to explain a very high accuracy model which you cannot basically get through a tree model or something right what do you do so we actually have that problem so we are working on building interpretability in the data in the in the deep learning setup so I won't have the time to go in the details of it but what we are doing is when you can imagine a deep learning setup as multiple layers and multiple units at each layer we are trying to figure out that when a particular output was predicted which path triggered with what confidence and then that path tells us that there is some association of that particular path with a particular set of characters and then in this case for example, yeah sure I just finished this thought if if the prediction was Hulu and the path triggered this then we know that these were the word these were the letters that triggered it and that's the interpretability I know it I know it's very vague but we can talk offline okay thank you so much