 All right. Well, hi everyone. Thank you for joining. This is, I believe this is session five in our series of conversations called making data science work and Thank you so much for joining us Venkata and I are from scribble data, as you know we are a feature store company in the ML engineering space and Today's topic is Experimentation in data science and this is a theme. I think it's been Resonating the last couple of sessions whether we wanted it or not, which is the science in data science aspect of it you know, which refers effectively to the process of developing some sort of a systematic understanding of the world and You know, there are direct implications again on how people are able to To take what they see in experiments and actually implement them that actually gives a bump up to top line bottom line Whatever it is. So today we are joined by Two people that are that for whom this is sort of their life blood So, thank you very much for joining us Paul and Varga. Well, let me just quickly introduce you So Paul is a is the co-founder of amp, which is a SAS product that takes customer communication and turns that into a business growth engine Paul had earlier co-founded and was chief data officer of PaySense Consumer lending and mobile index startup in India, which was acquired in 2019 for ridiculous amount of $185 million The other roles that he's held include chief product officer at DT1 He was the VP of data science at housing comm and principal data scientist at Teradata as well He has an academic research background in computational social science and began his career building statistical inference systems for the US Department of Defense Paul is also an advisor in early-stage VC fund in India and angel investor in startups in Asia and the US Hargubhastur Brahmanyam is the co-founder and CTO of Binay's Bhargava, did I pronounce that correctly? All right, wonderful. And Bhargava has spent the last 17 years helping businesses both large and small Use data and algorithms to build a moat He's worked on large-scale machine learning problems in transportation in banking software and networking products He's a trained statistician academically having done his masters at the University of Maryland in College Park and at his startup Binay's He's focused on helping small and medium-sized e-commerce companies improve conversion rates At an individual capacity He mentors people in their data science and entrepreneurial journey Which is sort of fodder for the kind of people that we like to have on our podcast and on our meetup so thank you both Paul and Bhargava for joining us and Today's topic as we just mentioned was about experimentation and You know so many questions come to mind when we think about experimentation and even earlier when we were talking about this You know from the setting up of the problems all the way to execution There were so many rabbit holes that we thought that we could go down And I would love to get started right at the outset before I do that one quick note to everybody listening in as you know This webinar this this meetup streams across multiple platforms Some of you are on zoom with us some of you are watching this on YouTube on Twitter We are watching your comments. We are looking at your questions So feel free to chime in with your questions and we'll leave them in as soon as it's appropriate or when it makes sense In the context of this conversation All right with that Paul and Bhargava Let me start by something fairly foundational Um, why why experimentation? What is the benefit to a business to a data science function? Why should they think about Experimentation, I know this is fairly basic, but just to set Just to get everybody on the same page about what we mean. I'd love either of you to take this question and answer it Okay, maybe I'll go. Thanks. Thanks for having me. Sure. Yeah. So if you think about Traditional ways in which you can understand customers itself So you could do Quantitative qualitative studies, you can do ethnographic study. You can send surveys or as the current trenders you can use Data to make some trends machine learning models and understand Well, all of these things are either backward-looking or are things which in some sense done at a setting which is not Equal into real life. So if you do usability study, you're getting them into a room And then you're learning and understanding about it, which may not be the same that translates into the wild Experimentation provides us with the set of tools and techniques that helps you understand the user and understand your business better In the wild in real life, so that's one that's extremely critical for any business The second thing is especially with the ways I mean I've seen some of the previous versions and the way All the machine learning models are currently done. They're so backward-looking This is also pretty much like what you guys do in scribble also in future store. I would probably say that people The data keeps changing over time and you need to have a better model about your customers which impacts your business model That's extremely important and critical and that's something that I firmly believe that experimentation is the way to go and Pretty sad that people don't talk enough about it, but I think that would be the next big wave in my opinion Paul would love to hear your thoughts Yeah, sure. I mean, I think that was a great intro I would just say that that it like like analytics and computational thinking and the other terms we use when we when we're talking about data science experimentation is a pretty broad term and So and so I think some of the confusion and some of the uncertainty around it comes from that broadness that breadth of What it refers to and and in a way like why why should businesses care about experimentation? well, it's it's really because why should humans care about experimentation and Humans care about experimentation because it's one of the fundamental ways we learn from when we're little So I have a 10-month-old baby and it's amazing. He's constantly experimenting, you know, everything is in his mouth and and he loves putting new things in his mouth, right and so We we start doing that as infants and and then we continue to do it throughout throughout our lives and and the science of experimentation or doing it systematically is just a way of taking that core capability and Making it scale and making it more trustworthy and making it more reliable and safer and and better at managing risk in order to achieve our ends in our goals and and so I think where where the opportunity is as Barbara mentioned in terms of Opportunity for businesses and where the broader trends and industry are going is that there's so much more that can be captured So much more value that can be generated from doing experimentation systematically and robust and rigorously as a part of practice Rather than a sort of a de facto informal thing that we do if you look at the word experimentation and the science and so on right the Model that conjures that comes to our mind is that of academic work that goes on for many years And who knows what the outcome will be at some point in time Can you comment on the the economics of the experimentation in enterprises? The economics of can you just elaborate a little bit more what you mean now? for example, I mean there was this I think in your By the way, Paul has a great article linking to a bunch of very interesting Works and articles including one from HBR. I remember reading it and essentially the the when people talk about experimentation they the Based comes to their mind. They think it is useless because most of them are going to fade right and So that creates an issue of how do you justify this in in organizations? So the entire article I think that was written by Dana rubric if I'm not mistaken. I may have yeah, yeah From this page by the way even page for the audience So the where is the hesitation for experimentation coming in in organizations Yeah, so I'll just take a quick crack and then maybe Vargas an interesting answer here but I would say that No, I was kind of half puzzled half the news when I first read that but I think it fundamentally comes down to You just don't like to know when we're wrong It's uncomfortable hum and When it's sort of when you run experiments, I mean he quotes this Jeff Bezos right who talks about like the economics of this of Making a lot of bets because the ones that pay off pay off for everything And that's how you know venture capital works and a lot of startups work as well But it's true for bigger industry as well in terms of innovation and new product development But the thing is is that that's not unique to experimentation They're getting it companies are getting it wrong all the time regardless of whether they run experiments It's just experiments. Let them learn it faster, which I think is very exciting It's sort of it lets you save time by figuring out where you're wrong faster So that you can focus and refocus on the things and build and compound the things that work so So look in the world We're in with so much change and with so much competition and the fact that everyone else is looking to satisfy the people You're doing business with your customers. I think it's not really a choice It's sort of if you don't figure out how to run experiments in order to innovate in order to off create new offerings And engage with people in a new world that they're in today. Not the world. They were in last summer or last year Then you will atrophy as a business There's no other way around it actually Well, you make such a good point because here is a way Experimentation as a as a framework is a way to bound your losses rather than getting it wrong You are able to define the scope of where you're experimenting rather than rolling out something and and then realizing You know that an experiment that still didn't work rather you're able to actually section off parts of what you're willing to stake So that you can you can observe the effects and cap your downside is does that does that jive with your your sense of what? experimentation looks like Yeah, absolutely. Yeah, I think that's a great way to put it I do think it would be really interesting because you know at some point I hope we hope we get some of the technicalities of some of this stuff or just like the there kind of rules or systems of thinking for this But I think it totally also depends on the kind of experimentation you're doing So there's there's like high-level strategic experimentation where you're like, let's try this new thing and then there's the kind of like It's same with machine learning, right machine learning Is really valid in a confined constrained context. You don't do machine learning in order to figure out You know which strategic direction you're going as a business because you don't have data for that And machine learning is about repeatable decisions that you have data for that you're trying to understand, right? There's some basic conditions for it and experimentation is similar to do it systematically And technologically the way I think both Vargava's company and my company are doing is is You know requires a certain context and that's different than the everyday or like the higher-level business experimentation Does that do you agree with that Vargava? Does that resonate? Absolutely, I totally agree. So it'll just go back to to one of the points that we just told a while ago This being so much deep in the academic world, but not so much in the industry phase Have a story to share hundred years ago. There was this guy called William Gossett. So this guy Do you know the story William Gossett? Paul obviously. It's okay. So so this guy Was running was in Dublin was trying to make the best stout. That's possible beer. That's possible and The large-scale experimentation was just not possible. He ran large production lines So he and his colleague actually took one year off and then went and wrote what is now the seminal paper in statistics about called the Probability error of mean we commonly know it as t-test. So and he went and implemented this and Guinness Guinness obviously is the world leader and beer Guinness factory in Dublin is the most visited site in Europe So when you have something like this, which With small amount of data do something that can walk the customers It's so it didn't have it didn't start with an academic base It actually started in an industry in a full plan. It's just that Like this is the point that Paul was also telling that that for a number of years after that It's just being buried in industry except one field, which which I think I'll come in a bit later Except that it's predominantly been Buried in academia and not so much in the industry I think with the digital wave people have been trying to do more Build more tools and do stuff around it to help improve stuff This brings to the second point in economics. I think I mean This is like like the point that Paul just made about VC So you need to make a lot of bets, but you have to be structured in how you do that so one of the links which He pointed in his blog post talks about experimental design and quasi-experimental design Must read for anyone who runs Not just data science anyone who runs a business should read that it's extremely important How do you think about taking bets make informed decisions and use that to grow your business? I would love to hear sorry sorry Venkata But if either Paul or Bhargava just on that last bit right because I think it would Segway it neatly into some of the more technical aspects of experimentation and any Points from that post that come to mind that people can take away if you if you know it or Shall we just assign it as reading material? I'm Right Venkata you had a question And we link it the area link Paul's article from the From our event page will link the other article as well so my Thinking was going towards In an in an enterprise right? There are some preconditions some stability and some structure for you to be able to do Experiments where are the opportunities in the enterprise to do the experiments and what have which space has Gathered more Momentum and which ones remain to be done if you will any sense of the landscape Sorry, maybe I'll go first But so one of the places where we really see experimentation at scalars in clinical trials pharmaceutical companies. They are I Mean like there's a point that I was telling before like that. That's one industry which That's one part of an industry which has taken and has really established itself Doing large-scale of a beta string clinical trials to get drugs into the market But the sad part is even if you so few a couple of years back I consulted with one of the largest pharmaceutical companies So you would you would expect that the part of Being mathematically and experimentally driven would permeate in all parts of the company Unfortunately, that's not the case. I mean especially in pharmaceutical. Well, maybe it's to do with regulations and compliance But but also I see that a lot in other industries also manufacturing. Yeah, we can talk William Goss at hundred years back experimentation is quite Miniscule in and the manufacturing industry a lot of the things that still happens right now is Is in the digital space or marketing is one functionality which has really Led that stuff and that's something that Paul and my company both of us work in that specific space but a lot of industries are just waiting to be disrupted Paul your thoughts Yeah, so I would I would say, you know a lot of this. There's this there's this great book The probability of chance or the chance of probability of the author's name is Ian hacking He's done a lot of work on on like the history of these things and he has this quote about The quiet statistician and like the mental models that are responsible like that have changed our way of thinking and changed our World but they don't get a lot of credit for it and I would say that you know just quickly I think there are actually a lot of industries where experimentation is incredibly important and useful I saw this statistic once recently in last couple of months And it was about buying a car in the US in the 70s And when you if you bought a car in the US in the 70s, you had about it right off the lot Right a new brand new car You had about a 20 to 30 percent chance of that car being a dud of it being a lemon such that you had to send It back to the manufacturer to get it to work You think Think about that like you're you bought a new car for tens of thousands of US dollars or thousands of US dollars at least and it doesn't work and and that just doesn't happen today you would never Experience that and and it doesn't happen for so many things. I I just got a new Adele XPS 30 a new laptop. I was excited It's it's going to out of the box and there's some things about it that I wanted to explore and I shipped it from the US to Singapore It was it didn't work and I was so shocked by that thing But the truth is is that in general when you spend money on and buying something that's been manufactured You can pretty reliably trust that it'll work and the reason is is because experimentation because these manufacturing lines Have done so many things to figure out and optimize exactly what the flow is and to do to make sure that they've eliminated as many Bugs as possible from that process and it is a lot of testing that's done in order to improve those processes So it's kind of like I think that we take a lot of things for granted that 30 years ago Would just not be a part of our world and that's kind of amazing to think about But but I wouldn't want to talk too much about that stuff because I don't think that's the world Most of us live in where I'm really excited is is in the world of tech Which is where I mostly spend my time But it's also you know the products that shape most of our lives, whether it's entertainment or transportation or delivery or Our e-commerce and shopping in these spaces I think there's it's ripe for so much to be done and I say this is someone who's spent my career in data science Is that I see so many young data scientists and people who? You know are training up and doing Kaggle competitions and doing Coursera courses and so much of that is focused on fitting random forests or you know Pulling data into a jupiter notebook and doing analysis And I think the the really powerful thing is going to be and then there's the sort of like this debate in this last Ten years I think you guys even did a conversation about it about how much is Is it was there to science about modeling versus the engineering right putting something into production means? Pipelines and and and software code and and is it reliable, etc? And it's that it's at that intersection and on the engineering side where I think there's some really amazing opportunities to do really good data experimentation but I do also think that we need a revolution and kind of a completely Much more focus and attention on tools for this kind of thing because it's kind of the same way You know ten years ago all right when I started writes I can learn was a very new thing And in fact most data scientists were just starting to do are and they were doing are out of sass and s and other tools, but The world now with pandas and second learning all these libraries is completely different right in terms of the work You have to do and I think it's very similar for experimentation There's a lot of things around sample size estimation and power estimation and just Confounding you know a lot of the technical things maybe we'll get into a little bit later But those methods you need to learn about them you need to learn about things like causal inference But once you do that and if we build tools to allow you to do that more effectively then I think Basically, I hope within five years. You're gonna see data scientists who are just looking everywhere within their company For opportunities to create experiment layer experimentation layers, and you know this Paul so this this idea of tools for experimentation It just sort of hits home so hard because it seems like at least you know a lot of the data science folks that we get to speak to For some reason or the other it seems like while they appreciate the value of doing experimentation Just because they have the background they understand all of that it is not that they are not aware of it But because the investment hasn't been laid up front in both from a business perspective and for senior leaders to budget for experimentation as well as for Things in the tool chain including processes to set up for experimentation It usually ends up in my my experience as being something that was done at the end to sort of get sign-off or to get Buy-in so when you talk about tools, it sounds it sounds to me like the most important I mean it sounds to me like there's a concerted effort to build this into the culture of the organization and I would love to hear a little bit about how you think about those I know because you know amp is is it's squarely in the middle of the space and Our girl are your company beneath as well. So a little bit about tools and why you thought that you know What you think is the intersection between actually building a tool and the culture that's needed at the data science company to Actually implement that in their context any thoughts on this would be would be helpful Yeah, I can go. Yeah Go ahead. Do you have some so I would just quickly say that like I'm a very practical person I well, I like to think that but I what I would say is I I completely agree that that Data science and computational thinking and experimentation are are are about a mindset and they're about the way you approach decisions but I also think that they're somewhat derived from the practicalities of doing work and and so I I've chosen to focus my participation and moving the world in this direction because I think it's a positive direction to move In the fact that in more bottom-up work So, you know, you know, I grew up in the US But I've spent much of my career in India and so, you know, there's lots of things there I learned things about different kind of forms of hierarchy and organizational Processes in India, you know, you often hear and people will say businesses are very hierarchical And so and so there's this notion that like you get the boss to decide what's right and then the organization follows And I think, you know, for sure, that's true. It's true in the US as well to some degree but But but but data scientists have the capacity and developers and software technical people in general I think in product managers and others as well They have the opportunity to influence the direction organization goes goes. They're the experts in the methods They're experts in the work being done And so, you know, the good ones and the creative ones and the ambitious ones aren't gonna sit back and just sort of say I'm gonna do everything that is instructed like the general direction strategic direction Yes, but in terms of how they operationalize those guidelines is up to them And and I think so I see the future is coming from us as, you know as workers or operators Making better decisions and then the results of that floating up and then that changing minds But in order to make that work I think you need to have tools because you can't ask someone to like write everything from scratch and Set everything up that's risking too much And so I think the best way to get changes to kind of create tools that everyday data scientists can take To demonstrate a better way to in order to convince their managers or their company's leadership But that's a good direction to go in so I'm a very big believer and bottom up And I think in order to do that you really need to to make it possible to do You have to create tools that enable You know these things to work better And the ideas will change but At Scribble we have this idea of our metric Called cost per question, right? If the cost is very high for every single Question that you ask About your behavior your customer's behavior Then it will automatically create an enormous amount of friction in Asking more questions and doing more of this experimentation and as the cost of the question keeps going down Suddenly our mind opens up to how many different kinds of questions in how many different ways we can ask. I mean when you Google there was some number there which said that Google runs 10,000 experiments and all of this these insane numbers that are there That has all become possible because it's lubricated by the very efficient Experimentation system and a very systematic way. So Ron Kauhavi and others have actually talked about it fantastically Bargawa you want to share your journey and your thoughts as well. So a couple of points here one of like So I have one huge takeaway, but I'll break it down into three the first one It's now I'll do the other two little later But the first one I would say is about Explored exploit framework, right? That's a very powerful framework to think about If you think about the way business any organization data science marketing supply chain operations human resource doesn't matter any Organization any functional unit within a company? They're all optimized To exploit at any given moment in time But as humans, it's that's how we also evolve that you you always need to explore a bit and then exploit So unless you explore you wouldn't Understand more of what's happening in your business your customers in your data and your processors But this is famous Data is just a means to the end truth, right? I mean it's a very very oft-repeated saying Because Data whatever data you see is also not probably very accurate this measurement error There's also the setting in which the data was obtained This is whole concept of dark data data that is not visible Which we don't have access to at all which influences the process Which means that all of this even if you think that as I mean at this day and age I don't see any company which is there proclaiming that we are not a data driven company Every company wants to use data to do it We no longer are in the education phase of this evolution where we need to educate the importance of data But importance of experimentation definitely yes because this concept of explore exploit is not something that comes natural for a lot of people Follow me a great point That's true with my kids also that that anything that you see the first thing that goes into the mouth and then they learn Very quickly, right? I mean, that's that's them taking bets Figuring at all. This is food and this is not food And then over time so in a very similar way companies should also do that. This is not happening right now because of Extremely Limited set of tool set that's out there So to so the point of cost of a Christian a very similar thing is Cost of running an experiment is something that my customers asked me all the time It's so unfortunate because anytime you run an experiment you learn more That's lot more valuable than associating negative connotation of cost with it, right? So That's that's a disservice that I would I should put my neck out and tell that all the existing players in the market have actually done That way that it's probably more fear-inducing rather than a way which I like the way Paul told it moving the world in a positive direction Experimentation is so way to move the world positively towards because you understand you create better mental models and you start exploiting it Tools that are there currently in the market. For example, there's only one unicorn in the space optimistically and It's clearly targeted towards extremely large enterprises. You need an army of Statisticians to work and implement it. It's not reveal for a normal business user So the moment so for example, I specifically target really small and small to mediumish E-commerce customers and they're like, okay, this is like experimentation. Oh, yeah, we did about it We were told about this way ever mentor But I don't think we are ready for it because the tools are extremely expensive the tools are extremely complex I can't go spend thousands or hundreds of thousands of dollars hiring people hiring tools setting up infrastructure. I think The tools that are needed to drive those is still in its infancy and I think we need a lot more people building this ecosystem out So for both Vargava and Paul I know that you know marketing Then the marketing domain within larger organizations is a core focus for both of you Whether it is from the communication side meaning how customers interface or the company's converse with their customers or any other aspects of marketing Could you just help us understand some standards methods of how a data science team might set up experimentation because what I'd love to understand is the Shelf-life of the learning of this experimentation meaning when a data science team is doing all of this How should they think about doing it not just the one time but as a Something as they build into their process for subsequent things maybe ties into the cost per question as well Well, you're new to pure speaking Yeah, I was just gonna say a cost per question. I think is a great concept a great way to frame it. Yeah, so the This kind of connects is a nice segue from tools. I would say so So when we think about experimentation for me, it's it's just to like reconfirm it sort of 10 or 20 years ago If you looked at the number of companies that were doing machine learning and production as part of some aspect of their product It would be very few and now it's become more democratized because of tools And and more downstream and long tail of companies can do it. I think similarly with experimentation And so so so now there's these companies that are sort of in this space. I think marketing is an interesting place where there are tools that sort of You know give token Acknowledgement to experimentation they build capacity to do certain things and I would just give some examples here of where I think there's gaps and where and where the problem there is in in the direction of Making this possible and in the direction of figuring out this question of like how long is a is an insight worthwhile So a B testing is often kind of used as a placeholder for experimentation Which I think is a is a shame because it's not a B testing is one limited use kind of case of experimentation It's an important one. It's a good thing, but it's definitely not everything that experimentation involves But a be testing is a tool and it can often be used quite poorly and it can often be quite misleading and Part of the problem is because the tool doesn't really help in a lot of ways with some of the core aspects of what you need to do It well, so an a be test. Let's just kind of like I'm sure everyone's familiar with it But we'll just review right maybe test is basically referring to saying all right. We're going to try two versions Let's say we're sending a message to users and we've got message a and message B And we want to see which which one works And a be testing is sort of just saying well, let's send messages to users with Send message a and send message B and it's based on this concept And usually what the tool lets you do is is randomized assignment and what that means is that? You're from your pool of let's say a thousand users. There's nothing. There's nothing that should influence About those users that should influence their assignment to message a group or message B group. That's great In and of itself, it's great And it's important part of part of the test But what it doesn't solve because randomization is just sort of saying there's nothing can it's nothing affecting your assignment But when you get the results back the inference you're trying to make is if there's a difference in the mean Really like conversion rate for those two groups the question is is is is is it that because you're making an inference About causation that the reason there's a difference there is because message a or message b is better and the huge problem with that with that inference and taking away that insight is that You almost certainly your tool did not help you handle confounding What confounding means is just that There is a bunch of other information about you that characterizes your users. This is stuff that would be in your crm. For example Age, you know, how long they've been on the site. How many products they bought average Purchase value in the past six months, you know, just many many data things that we would use to fit a model or do other interesting data science things on it and Those all interact with the likelihood of converting, right? We're just going to get technical here for a minute If you take any one of those you can kind of make an assumption or make an inference that in some cases That's influencing the propensity to buy That someone who's been who's purchased A higher amount in the last month is more likely to continue purchasing, right? And so the issue is is when you do random Assignment for a simple ab test what you end up having is imbalanced in these other features So I have assigned 500 users to message a and 500 to message b And but because I've done it randomly and I haven't taken into consideration the arpu for that user in the past month Then I'm going to have imbalanced it's kind of like if the four of us were playing poker or some other card game and we randomly shuffle the cards Um, even if the deck was completely randomly shuffled and then we assigned a hand to each of us There'd be a strong likelihood that I mean it's very unlikely that we all have even hands Someone's going to get a better hand and so similarly One of those groups message a or message b is going to have a higher average You know arpu for example for the customers in that group And so then when you're trying to take away the question of like a week later and you're looking at conversion You see message a has a higher conversion rate, but if it has this other demographic feature or this other feature of the group That's different than message b now you you're forced now You don't know is it because of message a or is it because that feature had higher representation in that group? And so you know this is this concept of confounding and if your tool doesn't help you handle that and there's like There's ways to do it, but it's a pretty tricky problem. Um another another example. So Yeah No, I was just going to ask it sounds what you said right now, right? I mean, uh There are going to be so many variables in the setting up of a good experiment Constraining it in the in the healthy ways that you're talking about How much of this you think can be driven by a tool versus having a Really good judgmental human in the loop So I because you made a choice. You're an entrepreneur today. That's building a tool in this space How much of it are you looking to offload to the tool versus how much Needs to be in the hands of thinking data scientists at that organization customer organization. Yeah um Do we have a disclaimer that I haven't paid you guys it sounds like you're like you're setting me up for uh for a win here Andra No, go for it. Go for it Seriously, no, I think there's amazing things that you can do if if your tool sets you up There's much better Inferences more reliable inferences you can make and remember data science is not a it's not a hundred percent game It's a game of probabilities. So it's you're not about being a hundred percent confident in your inference It's about can you shift your confidence from, you know, 60 to 65 percent and that can translate into Really significant amount of revenue for your business And I think it that is entire it is, you know, extensively a question of Of is your tool helping you handle some of these methods and some of these sort of technical and statistical problems The other one which I think is even a better example is related to, you know What bargher was saying about an explore exploit trade-off To do experimentation well and to like get your conversion rate, you know And have a good r or y is also a question of efficiency And so if I have a thousand users and I assign 500 to the message a and 500 to message b If message a is end up being better then I've wasted 500 users sending the message b Yeah, could I have learned that message a was better with 100 users in each group or 150 users in each group Because that would have saved like you can then take the improved Improvement and conversion and multiply it by all the people in your b group and that's again significant lift So can you have a tool that lets you kind of monitor Your assignment groups Monitor conversion, especially if let's say it's a link in a message or you know, it's a checkout And actually efficiently sample and then that kind of bring me to your your original question Which I'm sorry it took me so long to get to but which is how long Is an insight good for and I would say in today's world probably not long Which is why it's so important that experimentation is a continuous like systematic party or process in order to do that You have to have it set up well Because you know in this I talk to so many companies, you know, some of the best ones you can think of in India Right the ones that raised the most money, right? I talked to many of them and they'll do things like Set up a new feature launch it at the beginning of the quarter They do an ab test and then they decide that they they learn that a message works better than another one Or a particular flow works better than another one And then they'll use that and then they'll set that But then they're overwhelmed with so many other things Right as any business is going to be have way more to do than it has time to do And so they'll keep that flow for the next six eight months a year And what's the chance that the message or the flow that worked, you know in January works for you around Diwali in October Not not much right, but you can't go back and you're learning, you know When does the seasonality change? When is the world change? When does the competition happen? And so you really need to be continuously running experiments, not just at the beginning of the quarter But if it's a manual effort, you probably can't afford to because the cost per question is too high If it becomes more automated or more tools doing it, you bring the cost per question down Now you can do it. You can learn more and you get ROI for it This is Tyson nicely with the question that has come on YouTube from Ankit Dube Probably Bhargava can take this Can you build off what Paul was saying and talk about the experimentation design process? Paul talked about, for example, repeatedness of this And can you talk about feasibility or some planning whether the Whether you have the right data to be able to make that assessment And so on experiment design process Okay, so I'll take some That's a good question. So I'll take some examples that I've done in the past, right? So so to So not at the current company, but before done two major places where it was entirely driven on experimentation one is Recommendation systems personalization engine, right? So when you build recommendation systems to drive and this is for regional content app that I was helping them build recommendation systems and The entire consumption was based just on What was personalized for that particular user? So no two users saw the same screen Or it was just content that came and that's the only thing that they could consume Which means that the the objective was to increase engagement in that particular platform And that was the metric kpi that the business owners were looking at Now when you have something like that, you have I mean, obviously you can build 100 different hundreds of different types of recommendation systems right from simple Or similarity based stuff to content collaborative filtering to Complex deep learning models. There's a whole bunch of whole array of models that can be built But they're all built as I told on on Engagement that happened before so if you want to do something In practice one of the things that you would do or things that I have done is segment your users in a Appropriate way so create Of for example, if you run a facebook campaign, it lets you create audience ways So in which you'll have to create smart audiences Sometimes rule-based some if you want users have very specific interests the business users want it built in a specific way or sometimes in Use use statistical machine learning models to segment your specific users and then run Different models on it use the inference and then start working towards improving it Just in terms of technique one of the things that has really worked for me is multi on bandits so So you have multiple bandits and then you do this explore exploit framework and see which works better So it helps you one thing is this being just recommendation driven the test is running all the time. So which means that For the company It's just a continuous progression of this that keeps happening users preference obviously keeps changing because what's hot what's new content that they have bought everything impacts the specific usage But then that's typically one step where you use that and you use different cohorts to see Which kind of recommendation models work? I mean netflix has written some amazing articles around it. We try doing something similar so artwork Experimenting with different images that someone can see based on the time of the day for a particular segment Those are examples of how Specific uses where this would be used, but this could just be used to drive engagement Maybe Paul can give a slightly more structured breakup of this design process itself, whether it is a selection of the data evaluating whether it is You know, whether it has the quality and the The signal and the collection process is not corrupted and so on assessment of the readiness of the data the selection of the problem Actually, I happen to now I would sort of say there's two aspects of this that I think have to do with the experiment design process and and and how you can use kind of like Kind of institutionalized or or put it in into technology into a tool Which is what we're focusing a lot on at amp And which we've done with some of our early customers and which I'd done at my previous company at Where I was working with them with many telcos and messaging their prepaid users subscribers And and the two aspects are the first is you know coming back to this example of conditioning that I talked about in confounding Conditioning is the way you deal with confounding And when you design An experiment what you're trying to do if you don't do it naively in the sense of just randomly assigning without looking at any Underline features of those users Is is conditioning you're sort of saying condition. So so I wanted to in a in a sense naively. I'm going to balance These variables that matter But then you kind of come to Indra's problem Which is you have too many variables that matter and one of the things I like to do And like to emphasize, you know as a data scientist is that a lot of the times we should be ignoring data I think that's a bit You should ignore your data and what the reason that is is because you have You can get Much of the data that you have that's about humans is duplicative. It's it's it's a duplicate, right? It's there's high correlations inter-colonialities and other things like that where basically One piece and one another piece two snapshots can end up giving you the same insight about the fundamental you care about Which is the probability of converting and so in that case To balance over both of those features would be kind of Exponentially expensive for you versus reducing that by recognizing it And and simplifying so this is sort of dimensionality reduction. So dimensionality reduction plays a huge role in good experiment design By letting you efficiently find the data that matters and the cool thing about this is there's an opportunity to learn Both about your metric, but also about your users. I think this is what's so powerful about this is that You can learn two things you can learn one that your message is more effective or which message is more effective And two you can learn which aspects which segments of your population have a higher baseline propensity to consume or to purchase or to convert And that itself is you know is really useful Even if you find it wasn't your message that did it it was that this particular segment of users Is highly likely that means you can target those users going forward. You can figure out and fix your product Which is why actually I I think for amp. We're we're definitely not just focused on marketing We actually think that this is a tool product managers can use Because it's it's about making a better product. So that's you know, that's one aspect Is this sort of the dimensionality reduction side in the conditioning side of experiment design? Yeah, one one other one I just want to quickly go over is is that then there's the there's the question of How can you calibrate the thing that you're trying to test? Right. So a lot of times when again when we do naive ab tests We we are testing a message in a message like message a and message b Um, and then what you find out after it's done is that message b is better or whatever, right? But as a product manager or as a business owner as a as a person who cares about some aspect of your business You don't really care about message a or message b you care about the mech and the fundamental the trigger that's within the message So take a message and if you decompose it what you end up having is what do you say in the message? You have a value proposition. You might embed an incentive. You might embed evidence You might have a call to action Right. So think about the value proposition might like a pay sense We are all often asking whether our users cared about were they coming to us for loans because Because we are convenient as in you can just do it through the app Enough to go to the bank or because our prices are low You know low interest rates or is it because it's speed right the money goes into your bank right away Right. So if I you know a messaging user to get them to come back and complete their application Which of these should I focus on or if I'm offering an incentive as an e-commerce site Do should I be emphasizing, you know, you know a discount on my next purchase or a free giveaway, right? And so all of a sudden you start to realize that in a message You've got all these levers you can pull all these things you can calibrate and toggle In order to maximize the effectiveness of your tool of your instrument with which you're engaging your users But in order to do that, right, you need to have a clean process You may have a clean experiment design process that lets you sort of say, all right Here's a library or a catalog of different levers that I can pull in my messages And how can I systematically track those and systematically vary them over time So that I can learn over time because you're not going to figure it all out in one ab test Again, I just want to drill home this point. You're not going to learn everything in one test You have to keep doing it. Um, and so to me, that's uh, You know a core fundamental part of the experiment design process Is making a structure to your thinking and I'm making a structure to your testing so that you're Cumulatively learning you're not just learning shallowly it breath, you know I've tested a million different messages, but each of those has no connection to the other because they're just blobs No, do you construct it make it lovers that you can actually cumulatively build on and that's a like a powerful way to use experiment design Talking about the conditioning Um, one of the questions that I had was maybe barbaca barbaca net us What if there are hidden processes and variables that are impacting the outcome the number of columns that you have In your database you can condition against them. What if there are variables beyond? Beyond them. I think in the case of barbaca's you don't even know the demographics right of your when you're doing this No, you wouldn't you wouldn't solve well Okay, so let me so let's take a step back on how they to the point of Paul is telling about experimental design itself I think there needs to be a lot more structured process in how the experiment itself is run. So for example, if you're going to run So one is the process itself, right? The second thing could be the campaign. So for example, I run a facebook campaign So are you able to segment it better? Are you able to get? as much of the features that are needed to be in place But then unfortunately You wouldn't know everything in one or even 10 tests. It's it's a People's choice changes business evolves and then this data keeps changing over a period of time. So that's so There's an interesting concept called the fritkins paradox, which basically tells that if There are two equally attractive options. It doesn't matter what choice you make So if you're going to tell which shade of blue am I going will maximize conversion? I mean, okay, it's not a great AB test. We can't claim who I have done an AB test with different shades of blue and this is what Maxim is that's Probably spurious in my opinion, but but the value that it adds is going to be so low Also, one reason why you have so many sass sites looking so similar these days, right? So people have figured out. Oh, this is like Well proven. So let's go Only the exploitation keeps happening exploring doesn't happen. Which means you also don't know what what else would work, right? So that's that's one of the biggest challenge when you talk about Facebook advertising the Facebook is the hidden process. They have their algorithms that will keep Distributing and redistributing based on their internal logic how Reliable are the results Of what is again goes back to the shelf life and how how should we understand the outcomes of experiments When these kinds of huge processes are involved But aren't these like like what Paul told that these insights don't carry weight for much longer time It it's going to be continuously evolving But but we are just talking about one aspect of it which is from outbound to inbound, right? But there's lot more things that can happen. So think about Think about the impact. The reason I want to drive this home is that the impact that a specific Experimentation can drive can change from industry to industry for example in clinical trials experimentation is make or break. So if you take about the the drug from the time it starts In the lab consumption to becoming break even it takes anywhere between 10 to 14 years, right? And say something like a performance market area entirely marketing led e-commerce company I mean, it's probably Matter of ours before something else a new shiny toy comes up and then they go So we are looking at two different ends of the spectrum and look at something like a startup where startups need Limited access to people limited access to talent resources money You'll have to do experimentation to see which side you have to go and you can't go all in you're going to make Educated guests and educated the bet in one direction And then you're going to keep getting better at it And this is how so think about building any product any two last series of beds, which is just experiment driven That's what I'd say There's a fantastic book by Annie Duke. I don't know if you're thinking in bets Oh, you're thinking yes I highly recommend poker. Yeah, it's a brilliant book. Yes I think I think we But I do want to ask one last question to sort of bring this all together Which is I think we've talked about, you know, the clear need for experimentation We've talked about some of the ways that we would go about doing this My question would be that from a top-down perspective and all this is not not the bottom up that you were talking about But from a top-down perspective, what I mean by that is Getting getting leadership on board getting the organization set up to build Experimentation into their process. Are there in your experience as business people? um, are there any methods any Any ways that you've thought about how to prepare an organization and their infrastructure to enable this experimentation Do you talk about the benefits or or do you just assume that At some point they're going to see that it is part of the legal necessity. How do you sell experimentation? I'll just you know, I'll add one more thought here and the reason I asked this question is because a lot of our audience Is our people who want to do the right thing who want to take a very measured approach to being able to do this And they have to manage upwards as well So any thoughts on on how best that they can do this to build that culture of experimentation in? Would be a nice way to sort of tie a ribbon on this Sure, yeah, so yeah, so when I When you know, I started my career and I refer back to this I started my career in the US Department of Defense and I was it was a small team that they just set up to do large-scale computational modeling For a regular conflict a regular warfare so places like Afghanistan and Iraq And and so most of my work was very systematic But it often was used as material to make higher strategic decisions And and so it would get sucked in that direction And for a while there I kind of paid more attention to this world of sort of policy and like sort of national Decisions and things like that, which I think are analogous in many ways to senior levels of the business And there was this there's this movement towards towards Moving away from just like Speculation and hand-waving and towards sort of saying can we be more rigorous and more data-driven and more Empirical and one way of doing that was sort of saying Putting your prediction down or putting your forecast down or putting something down and then sort of seeing what happens And that to me is also a great way to lay the foundation for experimentation because what it means is We believe this is going to happen We we think that this is going to happen So when you say that when you put it down you have an opportunity to learn whether you're right or wrong And you can learn why were you right or wrong wrong? But you also have an opportunity to go ahead and try and suggest the possible alternatives So is there some way you can carve out space? Let's say if you're making an organizational decision to sort of say well Can we is there any way we can split our group or kind of create some opportunity to try something different? And and then evaluated a little further down the road and I think that you know creates space for experimentation and it sort of it sort of Helps you realize that how how You know non-revolutionary it can be as a first step But how kind of what an incredible impact it can have It doesn't have you don't need to turn your business upside down. You don't need to change everything that you're doing But you can make these small kind of small directional changes And maybe it'll change significantly where you are as a business two or three years from now It's sort of like two two lines emerging from a from a point They start very close together, but the distance but but that should hopefully be a great thing a good thing. So Um, so that's you know my thought there very nice Are there any closing thoughts from you? I mean, I would I would say that think about explore exploit framework Testing all the time It definitely pays off think about Just the same referring to what emphasizing what Paul told that Think entire process perspective not an individual don't be in a silo thing from an entire process perspective I think the payoff is a lot more when it comes to that and think about Organizational change as a series of structured that's Experimental that very nice So gentlemen, I mean we are we're at a minute past seven. We've just We start closing time. Thank you so very much for having attended There will be comments that we expect on our page and we will relay those to you But if they were to look for for you on the net, what is a good way for them to think that's with you Maybe your twitter handle that you guys are active there. Argava. Maybe we can start with you Yeah, twitter is the best way to reach me. Uh, my first name bar gva Uh, that's my twitter handle wonderful Paul, how should people get in touch with you? Yeah, so I'm also on twitter You can it's at p my the first initial of my first name and then me i n s so the first Five letters in my last name at p mines But i'm also, um, you know on linkedin. I think my email is just generally out there. So i'm i'm happy to to chat We will be adding both of your twitter handles to the page as well. So people can find them from the event page Yeah Thanks. Thanks for the wonderful wonderful Thank you so much. And I will tell the audience that of course This is session number five for us and with each of these sessions We usually catalog a bunch of takeaways and you can find those on has geek comm slash tip elephant under the archives of Making data science work. We'll have the notes up on for this session as well there With that, thank you everyone for having joined and thank you to all of the audience members that listened in as well one last announcement The upcoming session the next session of making data science work. We will be discussing Again about tools, but this time It will be on tools for operationalizing fact fairness Accountability and transparency. We have two couple of fantastic panelists like the ones that we have right now To talk about how they're thinking about this whole space Overall, we are a big believer in a very disciplined approach to data science and It just continues the the thread that we have been on Thanks, everyone. Great Thanks. Have a good evening. Yeah. Bye. Bye. Bye. Bye. Okay