 Welcome my name is Rohit Loteliker, I am faculty at Insofi International School of Engineering and today I am here to talk about the challenges we all face in data science projects is not having enough raw material right and I am sure many of you have experienced this. You know you know over the last I think 10 odd years you know we always see this graph of you know data growing exponentially people talk about you know every year the data is growing some 30 percent right it may even be growing on your system or your hard drive but when you come to the data science the project that you want to do when you want to deliver insights to the business you always have this problem I do not have the relevant data so am I just getting unlucky or is my luck I mean you know or you know I am not understanding the problem that other people are you know saying that there is too much data right of course the problem I think we all understand is data is growing very fast but the relevant data the data that you need for your project that is not growing so fast right that is often that is missing you might even have a large number of observations but you do not have the attributes that you need I am sure many of you have experienced this so let me talk about you know what is it that we can do right how can we overcome some of these challenges and I think the growth of data science you know to come in the future years is really going to be dependent on you know not you know better algorithms but also on acquiring data right so I am going to do a case study talk about some of the experiences this is from a few years back at one of my previous organizations where we were doing you know this was actually with IBM where I was previously employed and you know IBM is a large company it has different types of businesses one of IBM's main business for the contact center business the business process outsourcing business and what is this business responsible for one of the things they did was to run contact center operations contact center operations for clients one of the clients was a bank in Australia a large global bank and IBM was running the contact center operations for this bank they were doing it out of Philippines now you know this was there was specifically a team responsible for credit card operations so the banks credit card customers they would call into this contact center whenever they needed some help and so these guys were here in Philippines helping these customers now now you know after servicing you know customer would call in for whatever thing they had to call in about and then these agents would service those you know customers now you guys I'm sure most of you get calls from banks right and you might probably have a bank relationship manager and that bank relationship manager keeps periodically calling you and tries to sound helpful so our madam I can help you with whatever your need is do they really mean that do they really want to help you no you are the prey they're trying to sell you something right and that's why they're calling you and their job was exactly that right the real job was to sell products so of course initially they would service the customer and whatever the customer needed but after that they would try and pitch pitch some of the banks products typical bank it had you know the say these were credit card customers and this bank had other products savings account checking account insurance mortgages additional cards the usual portfolio so their job was to sell you know maybe an additional couple of products now the question we were asking ourselves so I was part of IBM and we was we said the IBM's I was part of IBM research and the contact center part of IBM came to me and said hey a doctor data science can you help us do something fancy and help improve our sales and so I said okay let me have a look let me see how data science can help so I flew down to Philippines and you know went to the contact center operations and try to look at what these agents are doing there so these the way these agents would work is you know they had CRM tool where they would see the customer's information so whenever a customer called in you know the customers profile would show up in the CRM tool and then these agents would look at the profile and try to get some sense of you know the customer you know is this family person is it a single person what is the income you know bracket what is the age bracket these kind of things get a general sense of the customer and then they engage into a conversation with the customer hey sir glad to glad to help you I see you're you know using your been using your card regularly do you find our reward program attractive you know try to get into a conversation and then you know in that process try and sell you know something else so that was what they used to do the question was we thought you know maybe we could build a propensity model that's the first obvious thing that comes to mind in the CRM system we saw this these agents are seeing you know these six to eight products that each customer they can pitch and they have to select first product and then maybe a second product to pitch to the customer there's no obvious you know thing in the tool that is giving them some idea what to pitch they are themselves doing their assessment of what the customer might need through that conversation and then trying to find some product that the thing they can tempt the customer to buy so propensity model was the obvious thing so we had a conference call with the bank in Australia and you know say we proposed we will we can build a propensity model and rank the products and give that ranking as a guidance for the you know agent to sell so that would appear in the CRM that ranking well what do the bank tell us well they told us they already have a propensity model okay but we don't where does it show up for the agents how are the agents the agents don't seem to be looking at any output from the propensity model they said well in the CRM system the products that show up they are ranked according to the propensity they're ranked according to the propensity but these agents were completely ignoring it well they didn't really care right I mean they had their own way of doing things I don't care you know we're going to do it our way as long as we meet the targets for selling these products well you know we're done our job right and that's and we told the bank hey these guys are not really using the propensity model they said fine as long as they using the you know meeting the targets the sales team doesn't care why should the decision support team in the bank care right so well so that was the situation all right so we had to really come up with a plan B they already have a propensity model can we build a better propensity model it's not clear that we can the guys in the bank have a decent team they're a domain experts they've been doing it for a while why should we as outsiders and IT company come in and be able to build a better model than a bank's internal team maybe we could but even if it's a little bit better it doesn't mean that the agents will use it it has to be substantially better for the agents to notice a real difference so we needed plan B and question was you know we had to come up with a plan B right and here is we are already whatever data is there that that the bank is already leveraging so a plan B has to be based on something else what can that be right so you know I said let me understand you know how they're trying to sell so this is what they do right we listened in on the calls so me and another colleague of mine and what did we learn three things this is what good agents did first they would look at the customer profile they would try to categorize this customer hey this looks like a family guy he looks like good income maybe he can be pitched some insurance or he can be pitched some you know additional card for a family member then they would strike up a conversation you know agent gradual flow not to make like this really selling to the customer but you know trying to solve a customer's aspiration right the customers have aspirations trying to understand those aspirations trying to position a product to meet meet those aspirations so they're getting clues to that conversation what is the customer really need what are their aspirations and then they match those aspirations to one of the bank's products something that they can tempt the customer to buy so that's you know that's you know this and you know data science have a role here does it have a role in the sales process beyond a propensity model which already was there that's a question we have to answer so this is what we did the question is is there what can be improved right if you're going to pitch something new you're going to first look at what is the issue what is the problem is there a problem in the first place what can be improved right the first thing was the variability in the sales of sales anywhere you go right sales is not easy especially in a contact center in a contact center you have a high level of churn these are kind of people who are just out of college one year two year out of college or maybe not even college may be just done high school and you know here they're facing a customer from day one they're not at all productive it takes months for them to become productive and that's because they have to understand what the customers are like and they have to understand what are these you know the typical customer of the bank what are their you know kind of aspirations what are the needs and then they get a sense of that they get a sense of different segments of customers and they have to figure out okay for this segment of customer this is the kind of products you know that they're interested in so this takes time it takes months right so there's a long learning curve and then there's a huge huge churn again contact center agents right if they're not leaving the company they're moving into a different account a different client within the company right to learn something different so this is this is a problem in all contact centers it's a problem in sales in general and you know how can data science help you with the question right so this was the first problem right the variability there were good agents I would say the top 25% were the good ones who are mature then you have the middle two quartiles and then you have the bottom quartile right so the idea was can we make the lower quartile pull them up right using data science or the second thing right these agents are getting into a conversation but were these questions really enough personalized or could it be made better can the questions be more targeted and personalized to extract more valuable information what do we mean by that right let's take an example here's one of the questions that these agents would ask the customer we see your you know using the credit card a lot and you have you know paying interest every month do you find the interest rate attractive this is one of the questions they ask sometimes they would ask this question even if a customer has just you know they see in the CRM system last month the customer paid interest they might ask this question but in the previous 12 months the customer has not paid interest they've been paying the balances in full every month this is just because the customer ended up being late that one month because of some oversight maybe not the best question to ask this customer because they know the kind of person who really has you know doesn't have the money to pay off the credit card but that is something it's very difficult for these agents to do because in the CRM system all they see is the recent data the last month statement if they really have to go back and look at the pattern for the last few months they really don't have time for that because only when the call comes in that's when they actually the CRM data pops up right so there's no way for the agent to really profile the customer what they're seeing is a very high level rough idea of the customer's profile they're getting so what data science can do is actually analyze the customer's profile over a longer time period not just that most recent data and come up with a better profile of the customer come up with better questions that the agent could ask the customer and that's what that's what we thought we can do so so okay so what is it that you know instead of agent coming up with questions what if we we built a tool that could guide these agents that could guide these agents what questions to ask the customer think about it machine learning is not giving answers machine learning is coming up with questions coming up with questions questions that will be valuable in understanding the understanding the customer better and then step 2 based on the response of the customer to those questions can it come up with an improved propensity score can it refine the propensity score that the bank already had just based on the response to those questions and that is a much better propensity score it's trying what the bank's propensity model with what these agents were trying to do trying to learn in fact trying to learn the behavior of the good agents trying to learn the best practices of the good agents making it available to all even the most recently joined agents so coming up with the questions and suggesting a product bit those are the two things in fact the three problems right the first thing is coming up with a customer propensity based on all the historical data of the customer's credit card spending over the last year two years based on the demographics coming up with questions to ask now we say we are saying here two questions coming up with two questions why just two questions just two questions because this is a you know you don't want to feel the customer like you know you ask running a survey to them you know sometimes the I get these surveys which are really long and you think you know you there may be four or five questions sometimes if you click on that survey you get in the email you think maybe I like this company maybe I'll help them and give them some feedback and that's great you click on that survey sometimes the surveys are short right sometimes they learn into several pages and at that point you just abandon it so it's important to just ask two questions maximum two and then based on the customer's responses based on the response to the question can we update the propensity score so we're solving three question three problems here not asking questions is not a traditional machine learning problem right coming up with the right questions to ask it's kind of a missing data problem but it's not received a whole lot of attention especially in live interactive scenarios like a contact center so these two questions the second and third problem were really important to differentiating what the bank was already doing and what we as IBM you know what we could bring to the customer the criteria was now how do we pick those two questions what is it what is it that you know what is it important what is the criteria for picking questions now you know these questions are not we have a question bank we came up with a question bank of 15 questions by talking to subject matter experts product experts of the banks products so we have these 15 questions the question was which two questions from this to pick and so we had to come up with a criteria right this is the problem formulation part what is the criteria for picking questions it is you know which are the questions you know you already have a propensity score from the from the banks historical data now the criteria has to be which questions are the most useful in helping discriminate between the products your banks propensity score might have given say similar scores you know two products might have very similar scores which question will help you know discriminate the propensity more further so that was that was the criteria so what we essentially wanted to do was to model what we were calling the utility of a question each question we said it has a certain utility we have to find out that utility and ask the question with the maximum utility first and then the next question so the model was like this essentially we took a systems approach what I mean as systems approaches you know don't combine everything break up your problem into pieces first piece was you have the propensity score from the old data which is the all the existing old data then that old propensity score is getting refined getting refined based on the responses to the questions so based on excuse me based on what the customer answers you have an increment to that propensity score this increment could be it could increase the propensity score or it could decrease the propensity score here I'm talking about some product for that product the propensity is depending on the customer's response it could go the propensity could go higher it could go lower maybe it'll stay around the same so add these two you get the final propensity score so which question to pick is really the which is the one that which question has the most contribution in the second box here and essentially how much is the customer's risk you know propensity spreading out you have a certain propensity initial propensity base propensity based on the historical data based on a customer's response how much could be the spread in that propensity that spread is the utility of the question if the customer has let's say a question has to answers yes or no typically these questions are yes no kind of questions and let's say when the customer answered yes the propensity increased by 5% if the customer answered no then all then the propensity increased by 4% so there's a 1% spread that's not a good question it's hardly making a difference whether the customer answers yes or no hardly the propensity is going to change a good question is when there is a large difference between when the customer answers yes versus when the customer answers no so that was a utility metric now we have some some challenges here the challenges were we had a cold start problem we are starting this fresh we don't have any historical data on when I talk about historical data I'm talking about the responses to the questions we don't have any of that because we are starting this fresh so we have a cold start problem data is growing very slowly we did a pilot with a limited number of agents don't expect the data to grow very fast this is not like a web scale problem where you know we have tens of thousands of customers on your website very sparse data each customer is asked only two questions two out of 15 so it's a very sparse matrix so these are the challenges the cold start problem was you know we did it through business rules we said okay based on some business rules we will come up with the initial questions to ask the customer and then based as as you know the responses get collected we will refine we will be able to compute the utility of a question and that utility of the question will then be used you know to refine you know which question gets selected so let's talk about you know how we actually compute the utility let's say we have one question one of the questions is the one of the actual questions do you hold any other credit cards now in his now remember this these kind of questions are important you know in a credit card customers you know in the banks data there's the bank has no idea whether you have other credit cards or not all they see is that you know you have their credit card right so they have no idea where you hold other credit cards so let's say you have a product insurance product which you are trying to cross sell and then you have this question we found that this question the response to this question affects the probability that a customer will buy insurance why why do you think it might happen this was a model when the customer let's say you know in the initial thing you have sir this question is asked to a certain number of customers among those customers about 10% of them 10% of them are accepting the product insurance product so the propensity the base propensity before the question is asked is 0.1 or 10% then the question is asked to the customer now depending on the customers answer sometimes it's yes sometimes it's no people who answered no to this question 25% of them ended up accepting the card sorry the insurance product customers who answered yes they have another card only 4% of them ended up accepting the insurance product so you see that how this question is helping based on the response to the question we get a better sense of whether the customer will accept which product in fact this question while it is being was in this example it is for this insurance product it affects the propensity of other products as well so this really helps you you know re-rank the products some products you might see in going up in the ranking other products coming down the ranking it's not just insurance that gets affected so this is a model obviously other thing is matters if let's say say 95% of the customer said yes and only 5% of the customer said no even though this you know spread just 0.15 and 0.06 spread is large on a weighted basis the spread is still small so really look at it on a weighted basis so we weighted by you know how many people say yes and how many people say no so based on that we come up with a utility for each question so that we call the weighted spread is basically gives us the utility of the product of this question and it this is just for the insurance product we do the same thing for all the products and this gives us the utility for the question so we get a utility for all questions as we start you know the cold start problem is there of course but once you have you know start collecting the data and you know asking questions and collecting responses and we get this we start updating the utilities and over time the utilities start settling down into some you know the long term values this is what the interface look like let me explain this so we here we have a customer you have some profile data of the customer some of the those these are what you see here are the different products initials for different products like insurance additional card and so on these are the products so you know when the customer calls in the questions pop up right initially the first question pops up the first question here is is your current payment method convenient now this question is actually used for you know to understand whether the customer might be interested in taking a you know a current account or a savings account if the current payment method is not convenient you know then they might be more likely to go for a bank savings or checking account right so that was the idea then based on the response to this question another question pops up this is about maybe know if you have another credit card so that was the second question that popped up because the customer answered yes here maybe the customer answered no here maybe something else would have come up here so you have these questions and answer I mean you have these questions popping up these questions are based on the customer's profile so firstly they are personalized and they are selected to be the ones that are most discriminating for this kind of a profile and then each time the customer answers the question the off this is where you have the ranked list of products the top one is AC which is a checking account then the second one is credit line increase so each time the customer answers a question here the ranking will change here so what what happens in this way is that as the questions keep getting answered the agents they can actually see this ranking changing they see the ranking changing and they are interpreting that they are interpreting that a customer who answered you know who doesn't have credit cards maybe they are you know more likely to take up insurance so they are also learning they learning to connect customer profiles with the product fit internally in the mind they building a mental model of that when this is improving you know making the learning curve shorter so what are the outcomes we did an A B test over six weeks we did a pilot with a test group who was using this tool and a control group the test and control groups are balanced in the sense that because there's a lot of variability between the agents you know we make sure that the we have different tiers of agents in terms of the performance we make sure that the test and control groups have the same balance in terms of agent performance so this is what we found over the six week pilot we got about a 15 to 18% increase in sales in the test group which was using this tool interesting thing was the improvement in sales was maximum for customers who sorry for agents who are new experience agents who are good for them the increment was less which is what one would expect other thing that was important customer experience is customer experience we were you know the bank was concerned is will my customer experience be you know degrade you know this kind of probing might seem a bit structured and artificial will my customer experience degrade thankfully no customer almost hardly any customer declined to answer questions we had come up with the scripts in the way that it could be smoothly inserted into the conversation flow no customer hardly any customer declined to answer questions so essentially customer there was no customer experience impact agent experience novice agents the ones who are new within the first two or three months they are the ones who find this the most helpful it helps them reduce the learning curve and it helps them improve their performance so they were very happy to use this tool so given these outcomes where can we use this is it just for sales is it just for a bank's customer center in fact any kind of a sales process you have you can think you can apply something like this you structure the sales process in a way sales conversation in a way that it can be more formalized and once you have these kind of a formalize this conversation you can use this tool to sort of encode you know what your really good sales persons are doing encode their practices encode the best practices that is what that knowledge is being encoded in this tool it support that's the other big area in it support what do you have I mean all of us you know we have this issue we go to an IT team and sometimes the people we encounter are really good they fix your problem other times you know they're struggling to fix the problem I mean of course then many of them are new people but what if you know if the now now these IT support tools whether you know remedy and service now and all these they have a lot of you know fields where once the problem is fixed the IT support person can say what they did to fix it what was the root cause what was the diagnosis you know what did they do to fix it often that is not populated so it needs to use something like this you don't need to build a new tool you can use the data in that old tool but you may need to have these IT people put in the right kind of information into that tool what did they really do sometimes they'll have something like oh I talked to David he told me what to do and I fixed it that's useless right so discipline has to come in once a discipline comes in you have this valuable data that you can use customer service huge long service right and why can't you know I think amix I don't know any people from amix here American Express right yeah okay I get very big long service from you guys and all sorts of irrelevant questions we can really personalize the service make it shorter you don't have to ask every question to everybody right so do away with the service and what else health care health care systems right so we have now you know more people are getting into using decision support tools in health care advising doctors on you know based on the results of the test what might be the symptom you know what might be you know ultimately be the issue and what might be the remedy but sometimes you know one of the issues still is like the low confidence right and the low confidence is because there's so much variation I respond differently to you are you are you right each person responds very differently and sometimes more tests are needed telling doctors not just what the diagnosis is maybe or might be but if the confidence is low telling them which other excuse me which other tests and should be done to improve the diagnosis so these are some of applications obviously any application like maybe iot something right if you have an iot devices or cameras those kind of things this may not be the right one because those things there's no cost additional cost once you have installed the camera or once you're installed a sensor there's no additional cost to collect more data cost is only installing a new sensor so that might not be the ideal case to use this but anytime you have a human interaction those kind of things this is very very useful to minimize the amount of time that people have to spend giving you information what's holding us back right why aren't we seeing more of this happening I think there are two reasons right one is something called an availability bias what is an availability bias might have heard this it's a tendency to focus overweight data that is available and neglect data that you don't have this cartoon is saying that somebody in today's newspaper yesterday there was an airline crash and then you know like that whatever mh370 or you have one of these Boeing planes going down and then you're scared to fly for the next several weeks right you're overwaiting the immediate data at the expense of other data that is an availability bias the same thing happens in data science you know all of us data scientists we get you know trained on whatever data source like Kaggle or you know others data sources there you're given the data on a platter in the real world it's not like that you have to go and look for the data so we don't really get trained in terms of trying to identify what data extra data we really need to solve the problem and that is a gap in training of many data scientists the second thing in any organization changing an existing process trying to change a tool is not easy there will be a lot of resistance to that people are used to doing things in a certain way and this introduces uncertainty that people don't like so what can you do here for the bank what did we do I didn't talk about what we did for the bank the bank wouldn't let us touch the CRM system no way right so what did we do we built a parallel tool we built a parallel tool where the agents are using to see the questions a web-based tool all they had to do was log in through the browser and then it would show them the questions they would answer the questions there at the back end the two things were connected it was just a one-way feed they won't they don't want us to push data into the CRM either so it was a one-way feed where the data was coming from the CRM system so yes sometimes parallel tools are needed at least in the pilot phase and hopefully at that time the organization gets convinced at the end of the pilot that they can do something more permanent so I I mean I would really encourage people to try and change not just think of data science as you know working with the data you have but also trying to figure out what other data you need because that is responsible you know for the growth growth in data science otherwise data science is going to you know at some point the algorithms that we have what has been driving the improvements in deep learning it is data larger and larger data sets speech data sets image net it is a growth in those data sets that's really driving better algorithms it applies even if you have structured data business data in fact asking questions is really a sign of intelligence this is a book I bought sometime back nice book if you have kids you might want to read it asking questions is a sign of intelligence in fact a lot of our professionals doctors lawyers detectives the main work they do is asking questions that is really key to diagnosis really key to finding out who's the culprit machines should ask more questions in fact I'm going to close with this remark right by Yoshio Benjiro we need to really think of you know he's kind of a little bit negative view on deep learning but I want to focus on this part of the message right where AI needs to be extended to be able to acquire information we can just keep you know focusing on the algorithms you know getting better results on image net we have to go and help algorithms also acquire information from the ecosystem so this with that I think that was my last slide and I think I'm on time more or less ten minutes ten minutes yeah so we can have yeah ten minutes good so we can have a few questions now yeah also the the conversation is happening on the phone the the kind of questions we came up with when mostly the multiple choice questions yes no or like what like one of the questions was what what do you find more attractive the reward program or the interest rates or the interest free days so it's like typically multiple choice questions so the agent when the by the agent the questions yeah the questions where the question bank was created by experts subject matter experts so adding to that question like I have also built one of the model in one of the largest private sector bank and where I need to recommend the customers regarding the same thing it was a call center and we have treatment about the product so we tried like their past history and then exploratory data analysis and combination of that we were predicting that different product so here also it was kind of exploratory data analysis that you use because questions responses yes or no based on that it is happening yes I wanted to know we didn't do any exploratory analysis for the questions questions were I mean the sense we spoke to the agents we weren't looking at data it was conversations we spoke to the agents we spoke to the banks product experts and came up with the questions that we thought would be the most useful no no speech to text the agent was listening entering it into the into the drop-down list how did you come up with the top two questions for each customer how did we come up with top two questions so I explained that that tree kind of a thing where I explained you remember that tree kind of a diagram so if the so let's say that two responses possible responses yes or no right if the customer answers yes there is some probability if the customer answers no there is some probability right if that spread between the yes probability and no probabilities large so we say that the utility of the question is high and that utility the question with the highest utility the first question to be asked okay but you need to ask the question first right see out of 15 questions yeah you'll be asking only two questions in each customer correct yes so once you ask then he gives sr no answer then only can find the difference or impact the propensity score yes so at the first time yeah how do you know which two questions to ask huh yeah good so I said you know we I said we have that cold start problem right that is exactly that is that is what the cold start problem is to start with you don't have any data in that case we had business rules the first week it was kind of actually a tuning phase the first two weeks actually the pilot where I said six weeks right but actually it was eight weeks where the first two weeks was just a tuning phase where the questions were just based on business rules and we collected some data and then the six week pilot really started hello yeah I have a question so like just to have like see we are having set of questions for like the Egypt really the machine learning algorithms are required to predict like the which kind of question to be asked to a customer or it can be done it can be done just with the EDA when you say so when you say machine learning what the point is you don't need a sophisticated model in fact here we had a very simple model just a discriminant analysis okay and the second question is like as you have proposed like the this last slide like algorithm itself should acquire the information it means acquiring the data so do we have any kind of the research paper in that like the any other algorithm which is successful in acquiring the data or have be any progress towards it because this is the new kind of things which I am hearing it yeah so there is actually there has been work in the past but it hasn't received a lot of attention there's something called active learning which is about not acquiring features but acquiring which observations to collect that is one area active learning the other area is feature value acquisition itself which features to acquire that's another area of research they haven't received I think active learning was hot I think probably 20 years back people were talking about it I haven't seen much happening after that this borrows some ideas from that but the main thing is the focus of this has been on a live interaction in the contact center how is it applied there those have mostly focused on you know transmit I mean in those days 20 years back it was more about I've got some data which data do I need to transmit to a different place like you know because there the data transmission costs in those days were high so it focus on a different types of problems okay sir excuse me so my question is that a lot of a lot bigger portion of feature engineering is checking the data integrity so how did you tackle the falsification of data in this particular use case so the user if you ask a user do you use another credit card he might as well say you know when he uses a credit card right he might try to act over smart or you know try to read into the process yeah so how do you tackle those problems we didn't worry about it I mean yeah there might have been some could have been I don't know we didn't really think of worry about it yeah so do you think these are only corner cases or the impact could have been larger than anticipated see if a lot of customers falsifying right what will happen is your you know the question will not the utility of the question will go down right if customers are falsifying right because it will basically cancel out the people are telling the truth and people are falsifying their their their effects will cancel out and then the utility of the question will go down and then the question will not become important so yes a lot of people are falsifying then this will not really work so is there a better solution you know around this problem or is there a if you can have a larger set of questions so that even if the utility of such questions goes down some other questions substitute for this factor I mean the way we looked at it is why would a customer really kind of questions we asked we carefully selected it so customer wouldn't feel offended or wouldn't feel embarrassed or to answer it so we didn't suspect really falsification and it in fact our questions went through a regulatory process where the banks regulatory team reviewed it and it went to that process so so I don't think that we didn't suspect falsification really happening right thanks so can you give some more details about your pilot so you said for two weeks there was pilot done so it was a six-week pilot two weeks was kind of a warm-up phase and then the real pilot was six weeks so what exactly happened in pilot so is it like customers call and every customer was asked to random questions or like every customer is asked two questions which are suggested by this tool but I thought with this pilot you were learning and then no that to first first two weeks was more and first two weeks was you know that learning was that phase when we were just collecting data based on it was still through the tool but it was based on business rules some business logic was used to select the questions and here you're kind of dependent variable what you're modeling it becomes that did the customer took the product or not yeah the target variable is whether the customer took the product or not yes yeah there's a question here so you mentioned there's an update to the propensity model right so basically you add some the propensity model score becomes some delta plus or negative whatever right so is it memory less in the sense that the kind of question that you asked in the first versus the second right so the first one you ignore the memory right so and the second one impact the end propensity score more than the first one right so so by the call the question is basically is it memory less or stateless no there is a state like the first propensity after the first question maybe the propensity went up by point five sorry let's say point zero five and after the next question again it went up by another point let's say it went up by minus point one second question at the impact so the total impact would be point five minus point plus point zero five minus point one so the net propensity would go down by minus point zero five so let's say if you would extend this from two to say ten ten questions right so then do you see a problem in terms of this aggregation things that we do yeah so then what would happen is sometimes if you know the propensity score go more than one or less than zero in that case we thought of that what we would have to do is actually put a sigmoid function so the propensity is go stay in that range so it's not nearly additive right it's basically a let light is very something's go down something's come go down come down or whatever is the relationship that you have right so I'm not sure if this really works well for having more than two states or what yes that's true it will if you have more than many questions you'll have to modify this you know probably a multivariate rather than having a simple model like this with a univariate model a multivariate model would make more sense and I think there's one comment on the actual learning right so I think the snorkel database program in data programming the way to generate the data to capture propensity rather than it's still popular I mean a lot of active research is happening yeah alright I'll be around for some time thank you