 So Paul and I go way back. It's my second time hosting this panel on industry best practices. And this year I have the honor to be joined by my colleague Scott McMillan to co-host this session with me. So as Paul mentioned, we have two parts. The first will focus on experimentation and AP testing hosted by Scott. And the second on how do we integrate the cause of science into prescriptive intelligence as a future direction. Before we start, I'd like to mention that all the conversations today represent the panelists personal views, which do not represent their affiliations. Without further ado, now I would turn it over to Scott. Great, thanks a lot, Victor. So we're really fortunate to have a great panel here. I'm super excited to lead this session. I'm gonna do some quick intros, then we'll get right into some questions that we've put together. So, first of all, we've got Evelora Bozhenoff. Evelora holds a PhD statistics formally at LinkedIn where he worked in causal platform and experimentation. Now an assistant professor at Harvard Business School. His research focuses on experimentation and more broadly on how organizations can successfully integrate data science and AI. And we've got Sonic Gupta, principal data scientist in Microsoft AI platform. He has scaled the experimentation culture in multiple products, including Edge, sorry, the Edge browser, Windows, Office, and now Microsoft 360 chat experts on, he's an expert on metrics and is leading the effort on evaluating LLMs through experimentation across Microsoft. We've got Amit Mungdal, and he leads digital analytics and experimentation at American Express, responsible for improving outcomes in all things digital, customer acquisition, servicing, payments and platforms. He ensures experimental methods are used for rolling out website, mobile app and email design changes. And last but not least, we've got Ben Skrenka. Ben has helped numerous stakeholders achieve their objectives by leveraging experimentation, machine learning and statistical analysis. Currently, he serves as a data science manager at eBay where he provides thought leadership to drive experimentation culture and methodology across the organization. So as you see, we've got a great mix of people with experience in industry, as well as academia. And so quickly for myself, I'm at Fidelity Investments. Victor and I form a lean and mean team focused on experimentation for one of the key business units. And I learned experimentation on the job really over the last 12 years, leading experimentation programs and teams. At Fidelity, we drive on focusing culture experimentation, expanding the footprint to new teams and channels and initiatives, establishing quality methodologies and standards, all with an end goal of allowing our product teams to innovate with a safety net and use data and experimentation to make decisions. So with that, we'll dig right in. And Victor, you can keep me honest on the time here. I'll do my best, but I mean, Victor and I would love to talk for hours with these folks, but we will work within the constraints of the conference here. But first of all, to start, we'll talk a little bit about experimentation value and KPIs. And a broad question to start is how can or should experimentation contribute to overall enterprise goals? Meaning, what's the role that experimentation can play? And maybe Summit or Amit would start us off here. I'm happy to go ahead. So yeah, last year, Yav and I, we thought about this question a lot when we were putting together our article for the Harvard Data Science Review. And the answer we came up with was that there's impact at two levels. One is at the individual experiment level that allows you to, and yeah, a good A-B test allows you to understand the causal impact of your change, which is really powerful that you can have like true accountability that your change is causing this metric to move. That, first of all, is really good in terms of direct feedback to the feature team, but also gives us confidence to roll out that change in a safe manner so we can test it out. Before we ever expose anything to our customers, we should always make sure it's at the ship quality, but even after that, we need to be humble that we may have not fully understood our customers' needs. So this A-B testing allows us to safely roll out a change by incorporating user feedback early. At an organization level, what it does is even more powerful that, one, it allows for a consistent and democratized and scrutinizable decision-making across the board. So we can have set policies about that these metrics should move in this direction only then we would ship. And then as we make multiple ship decisions across the board, we will have data about that these are the things we shipped and we can even go back and evaluate whether our strategy or decision-making was good or not. So that's really powerful. And the second thing it does is it democratizes the ability to explore lots of ideas. We talk about like how there are lots of ideas kind of, it's kind of like a bell curve in terms of the ROI from ideas. So most of the ideas, you get none or maybe little impact, but then there are these some ideas that can really kind of take off your product or tank your product. So in both cases, or in all cases, experimentation helps to safely explore this universe of ideas and get the most bang for your buck. Amit, would you like to add something? I very much should. So I think, you know, thanks for kicking us off here, Somit. And very glad to be here with all of you. So in my role, I'm gonna bring in a little bit of the realist's view in terms of how experimentation can and should add value. So I think the short part is very easy to answer. Yes, it should add value. And if anyone has ever worked in a large company, you would know that if it doesn't add value, it's not gonna get funded for next year. So unless it's adding value and unless you can connect the outcome of your program to something, you know, the big boss cares about, you're gonna be running short of money very quickly. So I think it's very important that as we think about being purists, when it comes to setting up an experiment, when it comes to reading the results of an experiment, when it comes to designing an experiment, we also keep the perspective of the business in mind, which in some ways cares a lot more about impact and less about how you get to that impact. Of course, within model and ethical and regulatory boundaries. And within that, I think of experimentation's impact in two different ways. And this has been popularized more by Jeff Bezos. I think of one way streets and it's very large strategic goals that companies have or business units have. And then everyday problems that crop up in terms of design changes, UX changes, algorithm changes, offer changes, personalization impact changes and so forth. And I think the second set of problems we must incorporate experimentation. I can't believe there are other better ways of answering questions than an A-B test, you know, set up as an RCT. And not doing it is you're gonna get to answers that are over a period of time have more errors than you would like. Now coming to the first set of problems, large strategic problems, one way doors, they may or may not be amenable to experimentation. You know, if we as a company and American Express is a credit cards company, if you wanted to launch an entirely new country or launch an entirely new product, could we experiment into that change? Probably not. I mean, you gotta have some articles of faith that tells you that it's a good idea. You gotta have a set of analysis that points to the fact that it is a great idea. And then hopefully experimentation can add to your understanding of the problem space. And there, you know, I do not insist that every such problem that we have in American Express is experimented into. Hopefully this gives folks a bit of a framework to think about as they think about experimentation in their own companies. Great, thanks for that. Ben, I'm gonna shift to you, but just to keep us moving along, and I'm gonna have you start off our next question if that's okay, which is, so when teams are, programs, teams, organization, they're running experiments and they're learnings, maybe as it said, they're gauging some impact, but how would you suggest teams and organizations kind of battle or work out that tension that exists naturally between short-term experimentation findings and how that can impact longer-term strategic decisions? Yeah, this is a, I think a challenging question, many of us struggle with on a variety of dimensions, both across teams, if you think about, even in the short-term, if you don't have good, and so I think that the heart of this is a lot around metrics, because as an economist, I think, once you start measuring something that creates an incentive, so the leadership needs the right metrics that align people with the long-term incentive. Now, measuring long-term performance is super difficult, particularly if your world looks like mine, you deal with things like cookie churn, so you can't run long-running experiments. It's very difficult. These long-term holdouts are very unsatisfying, so you're then in the world of trying to create identified proxy metrics. There's research by, there's a paper by Susan Athe and other people, but again, that's a very hard problem too. And when we let teams just choose their own metrics, it's very easy to have an explosion of metrics, and then you have confusion, and people are optimizing their part of the funnel at the cost of someone else's other part of the funnel. And so having leadership that can take a stand and get some consistency, at least within verticals or key parts of the customer experience where there's an OEC and overall evaluation criteria is super important. I think the other thing that Summit's done a lot of work around is this measurement of metrics and understanding how we find good metrics and also create an institutional knowledge store, but it goes back to measurement because once you start measuring, that's an incentive. So I think I can pass it off to someone else. Great, thanks, Ben. And Ben, I'm more disappointed not to see your dog going around, but maybe he'll show up later. Oh yeah, I'm a single parenting, so you'll probably hear some barking. Fortunately, she's sleeping. I don't know if you can see her. Yeah, she's usually more active. All right, Summit, you wanna chime in on this one? Yeah, so, and I think it goes back to connecting to the value of experimentation as a program. I think it's always easier to experiment around short-term and the here and now, but unless we connect the short-term here and now, conversion funnel improvements to the overall business financial slash customer metrics that drive decision-making at a leadership level, we are not gonna find that in the long-term, the value of experimentation as a program has reduced in the company. And so I strongly encourage my team to think about what they should look into as part of the experiment and which is gonna drive the immediate decision-making, but they should also look into other longer-term goals. For example, revenue. I can't believe a company does not look at revenue as a goal and yes, you might be improving your part of the funnel, but if the kind of customers you are now converting get you lower revenue than whoever was coming through before, hey, that's not gonna be a win or your financial leader is not gonna like that outcome. So I feel that it's important to look at immediate experimentation-rated metrics but definitely connected to business financial, strategic, customer regulatory goals that also drive decision-making for a major part of the organization. The other way I like thinking about this is what part of the fund LRB experimenting because I think a lot of us experiment in the digital space. And I completely agree that it's very hard to experiment on complete prospects where cookies are unstable or there's not gonna be much of cookies left in a couple of years. So how do you make sure if those customers come back a second time, you're giving them a consistent experience and problems of that nature? Whereas when it comes to existing customers and you are experimenting into journeys that are behind your own login slash other controls that you have, I feel that it's a little bit easier to set up populations that is either in the test or in the control and track it over time. And I think as experimenters, we also need to get comfortable with the longitudinal study or studies of customer behavior because also we have seen that experimentation results sometimes have very different magnitude immediately after an experimentation an experiment is launched and then a more steady state, which can be two weeks, four weeks, two months down the road when customers are more used to that new journey or experience. So I feel for those reasons, you've got to really have that mix of low versus, sorry, low, short tenure versus long tenure kind of metrics to an experiment program. I mean, can we dig into that a little if you care to. So are you suggesting, I appreciate what you said. So a couple of approaches come to mind. So just a little bit more tactically, are you saying you wanna track behaviors of those visitors whether they're customers or prospects on a longer term but limit your tracking and analysis to those customers or are you kind of employing other analyses that might show, hey, if we get visitors like this or new customers who look like this, we know based on a broader analysis that even though we're measuring short term, click through new signups of something that you've already made that connection to those longer term metrics or strategic goals. Yeah. I mean, I think that question or the answer to that question Scott operates at two different levels. So one is a simple prospect versus customer. I think for most companies, the prospect population changes over time and over very short duration as well. So whoever is coming to your website today is not the same person who's gonna come to your website tomorrow. It's just a different group of people. So I think that is a challenge that's hard to solve. Now, for that challenge, it's gonna be hard to say, well, did the same customer behave differently? I don't think there's just any logical way of answering that question. But you could argue that some business metrics that is being generated as an outcome of the digital funnel, you can compare that over time. So both conversion rates as well as perhaps revenue acquired per customer, those are comparisons, but it's not gonna be the same set of customers. Now, when it comes to existing customers, I think the dynamic changes because again, for at least financial services companies, I won't say this is a universal answer, but for financial services companies, when you use the mobile app or when you come to our website, you log in as a known customer and therefore I can track your long-term behavior both in response to the experiment in the short-term as well as in the long-term. And of course I can also do what I said in the first instance which is I can look at the short-term experiment related behavior as well as over time revenue or other metrics that the company's interested in. Great. All right, we gotta get, you have some screen time here. So let's keep moving along. So yeah, for you, can you get us started and we touched on this a little bit, but I don't think there's more you can add here of how to measure and aggregate and communicate the value of an entire experimentation program. You know, why, you know, some, so hey, I see the product team is adding headcount, you know, marketing staff are adding headcount here. How do we get more, how do we show that value and maybe continue to grow the experimentation team? Yeah, thank you, Scott. And thank you for the great discussion. And this is a problem that I actually thought a lot about when I was working at LinkedIn. This was actually a project that was given to me by the PM of the experimentation group. And what we did was we sort of took a few moments and thought about where does experimentation add value? And there's sort of three pillars that you can think about, right? Experimentation allows you to measure the difference between whatever it is you're doing. That's what someone was telling us and we've already heard it from a man in Benjamin. So you have this measurement piece of it, which is absolutely critical. The second pillar is the risking, right? And the way experimentation the risk is if you have a treatment that's really bad, you're able to switch it off potentially before it costs too much harm. And this potentially you couldn't measure. The third pillar is learning. And by learning, I mean, when you start the experiment, you figure out maybe what's working well, what's not working well, and you can try to incorporate that knowledge into the sort of the next iteration of the design. So what we did at LinkedIn, and we actually have a paper that's online on this, we basically realized, well, we can't really measure the value of measurement. That's really, really hard because what's the value of knowing that this is a 2% lift versus a 3% lift, right? So we can't really do anything about that. The risky piece of it, we actually can measure because we can look at the past year worth of experiments and we can look at, okay, which experiments did we shut down? And of course, there's a hidden assumption here that says, if we didn't have the experimentation platform, we would have gone ahead and just deploy this to everyone, right? So we can sort of use that in our tallying up of the benefits of the experimentation. And then for the learning piece of it, this is where companies actually have very different approaches, but one thing that LinkedIn is very much in favor of is this sort of iterative experimentation where you build sort of an early version of this. Of course, you test it internally, but then you deploy it to say one to 5% of people and you use that information to figure out how you can further improve the product. So it can give you a very concrete example. So in one instance, LinkedIn launched this new algorithm and they noticed it was not performing very well for specific sort of groups of people in the first iteration. They used that information to go back, adjust the algorithm and basically update the algorithm so that it was able to sort of do more personalization and do better performance on those. So we can sort of attribute that delta, that learning to the experimentation platform. So if you take this and you tally this up, you have the de-risking and the learning. When we did that at LinkedIn, we showed that the experimentation platform, if you add those two things, leads to an additional about 20% improvement on sort of one of the key tier zero metrics. And then you can take that and you can actually break it down by specific use areas so you can sort of see which areas are getting more value. So that's really how I've seen it's sort of a successful way of trying to communicate that value and trying to capture that. That's great, I love that framework. Might incorporate some of that into our approach. Ben, any thoughts on this last question before we shift to our second phase? Yeah, I think it's something that's really a challenge, right? I think we'd love to measure the benefit of adding a marginal data scientist to a team or a marginal AB test and it's hard to do. I think one of the things we did at eBay is we ran a maturity exercise that I catalyzed shortly after I started, I wrote a white paper and said, there are a lot of things that don't look right. And then some leader said, oh, this Ben guy, he's an idiot. And then he said, well, maybe he's not. And so then we convened a cross business unit team and we did a maturity models, kind of like the Fabi John work at Microsoft with our own wrinkle where we thought about all the things that would be important to have in our experimentation platform and also what was best practice in industry. And then we ranked all the verticals based on how they were doing from crawling to flying. And then we could see which verticals were doing a better job in experimentation and you could also then try to correlate that with performance. And there's also the Fabi John paper about the flywheel and helping people understand there is this experimentation flywheel. And so that also by shifting culture and showing people that if you're making decisions based on intuition, you're probably wrong by presenting things like metrics around what fraction of ideas actually ship. We know that that's like one out of 10 to one out of three. And so there are some indirect ways to get at this value or get at least a sense that this is making things better. But as Yavor points out, it's very hard to make this quantitative. And so I think it's still a work in progress for many of us. Yep, great. All right, so really great kind of first half of the panel here. Let's shift gears a little bit. And the second part will be a little bit more forward looking. And I've got two key questions here. We're gonna look at, you know, how would you communicate any of where you talked about learnings as one of the pillars? So hoping either you or maybe we'll start with Amit and then go back to Ben. But how would we communicate and aggregate the learnings for institutional knowledge sharing, right? Where teams are working away, depending on organizational structure, they can be pretty siloed and isolated. So how can we communicate and aggregate that for, you know, so the organization as a whole can get smarter? Where do you want us to start? I mean, I think this is a really hard problem, right? Like the institutional knowledge, communicating that, sharing that. And it's not just about experimentation, right? This happens with all institutional knowledge. I've seen people try many different versions. They've been successful where sometimes there'll be internal repositories that you can search for the experiments that people have designed. But I haven't really seen like a clever system that's able to say, you know, once you start your design to say, hey, here are the five experiments that most closely resemble yours. This is something that new technologies that are coming out, like generative AI, maybe that will be something that could actually help disseminate this knowledge. But I'd love to just throw it out to some of the other folks on this call who probably have great ideas on how to do this internally. Yeah, I'm Mr. Wamiak. So yeah, thanks. And I think that that's certainly something we think about, first of all, is, you know, building an institutional knowledge store because we know about availability bias. It's very easy to have certain ideas and it's harder to have other ideas. And so I really push people at eBay to make sure you're exploring all of product characteristics base, right? As an economist, I think of my product as a bundle of characteristics and availability bias, particularly since MBAs like to change teams regularly. It's very easy for them to run certain, seemingly obvious, but bad experiments that you would like to only run once. And so building an institutional knowledge store is helpful, but also having it be accessible. And, you know, generative AI is hopefully gonna be that magic thing we need to make that happen. We do a couple of things at eBay that are, I think, easier low-tech things. We encourage people to send out launch announcements and there's a template for people to do that. We're using generative AI to make it easier to build the launch template. And again, the goal is you wanna have a statistics-free experiment. So your MBA who's new and hates statistics can understand and make a good decision. The launch announcement is also great because it rewards and acknowledges the people who worked on the feature. And that is, I think, a powerful carrot to get people to run more and better experiments. We also track the velocity from ideation all the way to when the thing ships or it doesn't. I think things we don't do well that are important are negative results because otherwise we can end up with this bias where we're always only publicizing what we ship and burying things that are negative results which are also very important in learning because you can, and we certainly have teams like this, they run an experiment, it gets a negative result, you learn from it and that enables you to build that magic feature. The other part is aggregating results. I'm sure we're all beholden to someone in finance if you're in industry, finance, people wanna know what did you do over the quarter? And if you just add up all the lifts, then you're gonna have problems from selection bias and there's that Airbnb paper about how to correct with that if you're lucky enough to have normally distributed data. And there's a newer paper, Inference on Winners by Andrews Kirigawa and McCloskey. And I've seen some simulation results that show if you've got a small lift, it's much more likely to be exaggerated. So again, we can be smarter about thinking about how we correct for this bias conditional on shipping when I get a statistically significant result. So those are some ideas and then I think that's probably good for now. Yeah, that's great. So for those of us in companies and organizations that are struggling with this, like I think there's no magic bullet here. I think you try different things. Try what Ben said, what Yavor said, talk to your peers and just we try a bunch of things at Fidelity. There's it takes a couple of different approaches, I think. But I think what you have mentioned of some of these newer technologies, maybe making it easier, let's broaden that question. And this will be kind of our maybe our wrap up question for this panel, we'll see how it goes. But what do you see as the future of experimentation looking like and not next year, but five years, 10 years down the road in the context of what we know is in place today or is beginning to emerge in some of the technology advancements like turner of AI and things like that. Yeah, if you wanna start or summit. Summit, what are you gonna start and then I'll piggyback off after you. Okay, so yeah, me and my manager, we just posted a blog post about large language models and a framework on how to measure or evaluate large language models. Our basic thesis there is that with these advancements in machine learning, especially in large language models, traditional ways of evaluating machine learning models are becoming less and less sufficient in understanding how well a model does and even more importantly, how well the application of that model does. So in that sense, AB testing has become even more critical, especially in the case of like prompt engineering where like small changes can have really big impact. So I see AB testing becoming even more front and center in these problem cases. The more interesting part or the next question after this is like what kind of metrics we should be looking at when we try to evaluate these large language models. And that is what our kind of framework proposes. It's based on the signals that we get from the Azure OpenAI API, the signals that you might collect in your own product. So we have learned through our, like there are a few needs that measurement needs that need to be satisfied for evaluating large language models. One is like GPU utilization. We all know like GPUs are costly. Second is responsible AI. We wanna make sure that these models are working well. And third is performance. So we wanna make sure that I be modeled up performance. So our user experience with performance and lastly and most importantly, utility. So like what is the end product of this model? How is it being used by the end customers? So I see a lot of development with large language models going in the space of AB testing as well as development of new and exciting metrics that would allow us to find those long-term indicators of long-term success in short-term AB experiments. Yeah, I'll pass it over to you. Yeah, no, that was great. And I agree with you. I think experimentation is gonna get even more important. Some of you were talking about evaluating large language model. I think on the flip side, because let's be honest, generative AI is shaking up the business world right now. And the reason why it's shaking up the business world is because it's lowering the cost of generating many, many different versions of the same thing. So let's take marketing, for example, before you needed to have a person write five different mini-adverts and then they would go and they would run an experiment and they will have five arms. Now with generative AI, once you've done what someone has said, which is evaluate the LLM, it's working well, you're happy with everything, you have to then put on the next layer of evaluation, which is let's say someone uses this to generate massive variation. Well, experimentation is gonna be the gatekeeper. That's the thing that's gonna determine which of your 100 different ads, not five or 100 different ads that you've now generated, which is the one you're gonna use. Or maybe it's not gonna be just one, you're gonna be doing some personalization. So because of this, I think experimentation is gonna get more and more and more important. I think in the future, you're gonna see we've already started seeing things like multi-end bandits or dynamic experiments as being more and more integral into the experimentation platform. I see these two things over the next few years really being bridged and coming together and really it's not one or the other, it's we should be doing dynamic experiments. And then the final piece where I see this going is, we've already talked about it, experimentation is all about the measurements. On the flip side of it, it also gives you suggestions of where you might wanna innovate. But this is where causal inference and of course, causal inference can be used for measurements, but causal inference is sort of the link that tells you, okay, I have all these ideas, how do I narrow them down? So I think what we're gonna see in the future is sort of the completion of the causal loop where you have experimentation working really closely with causal methods. Causal methods give you the ideas and then experimentation helps you sort of evaluate it. So I think that's kind of the future we're going just to summarize, right? We're gonna have to do lots more experimentation with many, many different treatments, think hundreds and thousands of treatments. Experimentation is gonna be more and more dynamic because when you have a thousand treatments, you can't just do a thousand different treatments in an AB test, you have to be much smarter about it. We're gonna sort of complete the causal loop with, and this is perfect because this is what the next panel is gonna be about, right? And I didn't do that on purpose, but maybe a little bit. So that's kind of where I see it going. I mean, to all the people that are participants, you've picked a good field to be in, right? Like this is gonna get even, it's already important, it's gonna be even more important as we automate and augment more and more tasks. Yeah, I mean, like I said, I've been in experimentation for a while and it always seems like there's just something else down the road that to get excited about oh, like the North Star, we see a paper published by one of you folks or read an article, go to a conference and something to apply. And it's just that best in class or that, you know, finish line keeps, in a good way keeps moving forward. So that's definitely one of the reasons I love the field. Ben or, I mean, anything you wanted to add on this question or should we go to the last? Oh, I have a very... You guys go ahead. I have a very science fictionish view of this one, which is I'm hoping for artificial entities who are gonna replace existing customers and you can then expose your variants to these artificial entities and that's gonna give you the answer instead of waiting for customers to come to your site. I'll not hold my breath for that future, but hopefully we'll come to something like that and we'll see where we net out. By the way, that was a joke. I was gonna say, hopefully they have money to spend too. Hopefully they have money to spend. Yeah, I'm a huge fan of simulation. I think there are people who build, you know, these offline playback mechanisms already and maybe we can get smarter about that. I think that my closing thought on this is that while all these methodological and infrastructure advancements are amazing, I think they're the dogs, okay. I think the limiting factor is culture. That's like the gating thing that really limits how well we do. And some of you may be fortunate to work in a high-functioning pictures and others, you know, cultures are challenged between people, even getting them to run good bog standard frequent estate V tests well and reliably in a trustworthy way and accumulate knowledge. And so improving our ability also to affect culture change and build that culture will make these new methods have even more impact. And so I would just encourage people, don't forget about culture because we have to bring people along and making experimentation simple is another place where we can use these technologies to increase the impact so that you can imagine an MBA you can talk to Siri and Siri can run a AB test room. Yeah, Ben, I could agree more it's exciting to keep pushing the envelope on standards and methodology and approaches and technology. But at the end of the day, that you'd still need to work with the teams to continually just, you know, you have like you were talking about the de-risking and the value of learnings just to get them to embrace experimentation. And then they can take advantage of everything, maybe a team or a experimentation COE has put together. So, all right, well, let's wrap up. And I just wanted to, I see maybe one question. Victor's going to monitor those, but if you haven't noticed there is a Q and A kind of functionality should be typically, I guess at the bottom of your Zoom panel. So if you have a question, we can try to answer it live or perhaps after the conference we can aggregate these and get some responses to you trying to get back to you. So let's wrap it up with, and I know a bunch of you talked on these, but if there's any final kind of challenge that you see for teams and enterprises in the experimentation space, maybe looking in a little bit shorter term into next year and gaps that they might need to close. Any suggestions there? Maybe I'll throw one out there. I think the biggest place, I think a lot of us here have experience doing sort of B2C experimentation. The place where experimentation has really not taken off because basically A-B testing doesn't really work is when you have B2B, where you have very few customers or you don't even have access directly to the customers and you can't randomize. Think, you know, P&G, they don't have direct access to their customers. So I think this is one of those areas where if you speak to those companies, they don't really know what to do. So this is a huge opportunity if we can figure out how to help those B2B companies, yeah. Thanks, anybody else have thoughts on this one? I mean, my view is that a lot of, lot, but not all of B2B is really selling to human beings, right? Ultimately, at the end of the day, it's not enterprises talking to enterprises. I mean, they are in effect, but in reality, it's one human being talking to another human being or if B2B customers are buying digitally, ultimately they are on your website and a human being is on your website. And to that extent, everything that you know and learn about how, you know, B2C experiments have added value. I think much of that learning base would be usable for a B2B customer set. Now there would be things beyond that where functionality-wise B2B customers have different needs. And it's perhaps a little less likely that we would ever get to the right sample size and things like that for B2B markets. But still, I mean, I think there is enough learning in that B2C segment that can probably be transported over and I feel without trying to boil the ocean, that's a great place to start. Yeah, I would say the challenges continue to be metrics and culture. How do you know we're measuring the right thing and it's the right thing for the long term? And then how do we encourage the organization to do the right thing or follow a culture that would lead to the right outcomes? These are not new problems, but with more advancement in large language models and AI in general, it'll be harder for us to just think about it in a meeting room where you just have two versions, A and B. It's a lot more personalized output that each customer is getting to answer their own particular question. So we have to rely more and more on the right kind of measurements, right kind of metrics to make sure that we are serving our customers well responsibly and in an inclusive manner. Right, so let's wrap up. We'll ask for a very brief response, but we have a couple of audience questions. One is what's, and I think this, Ben, maybe piggybacks off what you said. What's your suggestion on changing the culture for working with teams or developers who feel very attached to the products they've built and all they look for is positive results from the experiment? Yeah, that's a great question. And I think we've tried to attack it in a lot of ways. So sorry for the dogs, they're very excited about something. Experimentation. Yeah, clearly experimentation. Let me quote my speaker. That's fine. We've tried to attack it in a couple of ways. So long term is training. So we've made a big investment in training and I think that e-learning was a really good choice there. We chose to build an e-learning class so it's available on demand. It's like short hip TikTok videos so you can dip in and go see the video for whatever you need to understand. I assigned it to the entire product organ. Compliance was like 7%. So maybe not what you would hope for, but trying to create incentives for people to run good experiments. I know companies have published leaderboards or had people bet. You could also do things like create awards for the most surprising results or non-results, blog articles and also just general data literacy training. You'd be surprised when you ask basic questions to VPs in some organizations about like, is a correlation causation. And so there's still a lot more education we need to do. Also older companies, this is certainly, I think something we've been trying to shift to eBay and I don't know if you have it somewhere like Microsoft that these companies that come from a very strong tech background, it's very hard to help some of the older leaders flip to understanding the data is now leading things and it's not all software engineering. So I've probably said enough things to get myself in trouble if there's anyone from eBay legal listening. So yeah, please don't turn me into the Gestapo. All right, thanks Ben. So we'll wrap up just to keep within our time limit. So let me just pause here and give a sincere thanks to all the panelists. You've all been some enemy for taking the time and just being open to sharing your views, clearly your excitement about experimentation today, but not just helping acknowledging and sharing that there are some, there's challenges out there, but also the future is really exciting. So those of us tuning in today I think are even more excited about our roles and our companies and organizations and about the future. So with that, we will wrap it up and I'll turn it over to Victor. Thanks everybody. Thank you. Thank you Scott and thank you very much again for all the panelists for a wonderful session on experimentation. So in the end of the last panel, multiple panelists discussed many exciting sci-fi-ish future directions that experimentation or causal science in general can make a great impact. In this panel, we try to pick up what's left in the last panel and focus on one specific future direction that causal science can target, which is prescript intelligence. Let me show my screen really quick. So it might be a buzzword, many people have seen from media, just give a little bit of clarity on what we mean by prescriptive intelligence. So it's just my definition and open to any challenges for this definition. So prescriptive intelligence could be defined as a set of capabilities that combine knowledge from the past and the predictions of the future to recommend the best actions to take, given known constraints and a choice set. However, the term prescription implicitly carries the meaning of causality. If we want to do something, we expect something else will happen as a consequence. But believe it or not, the currently causal science is underutilized and the question is why and how do we close the gap? So today, I have the honor to be joined by three experts who have deep knowledge in this area to help us understand how to close the gaps. So first, we have Patrick Dup, who's a principal applied scientist at the Zolando based in Berlin. It's a fashion e-commerce company. Patrick's currently working in algorithmic pricing and has worked on AB test infrastructures and using causal inference of observational data to answer business questions. Second, we have Thomas Burdell, visiting Paris in France. He's the chief science officer for IBM AI decision coordination, a product to help process owners oversee and optimize decision processes that involve both humans and algorithms. He's also an action professor at the University of Paris, St. Clay. Last but not least, my colleague, Victor Lowe, who's a SVP of data science and AI at Workplace Investing of Fidelity Investments. He is an analytics executive consultant and thought leader with three decades of experience in applying and managing quant analytics in multiple fields. He is also a pioneer of uplifting modeling and then he's currently finishing a book entitled Cause and Effect Business Analytics. So in the next 30-ish minutes, our discussion will take two parts. We'll first discuss how to build up causal science capabilities in enterprise and then we will discuss how to connect such capabilities to prescriptive intelligence. Now let me on share. So the first question to kick off the panel and I try to, and this question particularly for Patrick, since that's part of your title in your job. In addition to experimentation that we have learned extensively in the last panel, what are the other causal science capabilities that people can use in business setting? Maybe Patrick can start and then I'll turn it over to Victor. Sure thing. Thanks Victor. So there's a lot of other capabilities and some companies, some bigger companies will have most of these and other companies will have a subset of these. The most obvious one is observation or causal inference stuff, the things that we can't A-B test. And some examples in my past have been ones where you just, you can't randomize to customers. So you're offering a new product and they really choose to opt in. There are versions of A-B test, maybe you could do a holdout group or something, but you can also use observation, causal inference to get you there. And then there are other products where other times we're just politically in A-B test. Politically it may be a strong word, but internally decisions are made, not to A-B test, but you still want that information. So observation or causal inference tools are really useful. There was a great discussion yesterday by my colleague at Dream 11 about how they've built a framework to enable this inside the company. But beyond that, we have things like recommendation systems, right? And so what recommendation systems are gonna do is on a homepage we choose e-commerce firms and other internet firms will choose what to show the customer, show what to show the user. And we're choosing from a set, right? Historically, this has been like a machine learning, like a lot of supervised learning, but it's a causal problem. Like we wanna know what's the incremental uplift of choosing a particular piece of content. And so we can use causal tools to improve existing algorithms. And another thing I'll mention is thinking about how the company aligns in Santis with metrics and KPIs and how that's nested within departments and how everyone is aligned towards a North Star. We need things like proxies in there and surrogates. And any little bit of causal science can help improve the quality of the met, the choice around the metric around what people should be shooting for. And so you can help with the design of that, but they can also help with providing measurements of key tools. So you might have a particular product that you want customers to sign up to. And so if you know the incremental value of that, other teams can use that value in their measurement to measure how well they're doing. So they're just three things that I think are beyond A.V. tests that three causal science capabilities beyond A.V. tests. That's a great start. Thank you. Thank you, Patrick. And Victor, would you like to add anything? Sure, yeah. Yeah, thank you. I personally encountered this in multiple waves in the late 80s. I had, when I was a graduate student, I was asked by Project Gamble to understand the impact of pricing and shelf space, which was called distribution, as well as advertising on some product, versus their competitors' products. At the time, there was the first time I encountered a causal inference. Of course, the typical technique people use at the time, even now, is mostly time series analysis, which now people call it marketing mix or advertising effectiveness, which has been around since the 70s, maybe even earlier, but at least 70s. That is mostly based on some kind of heavy assumptions. You need to assume some kind of advertising sort of carryover structure and so on. And it's also at the aggregate level. So as in some individual level, you have some kind of aggregate level weekly or monthly data to analyze. So I did that kind of thing quite a bit, some synchrony, in a management consulting firm. And then I encountered many, many similar kind of problems either at the individual level or at the aggregate level, basically using some kind of assumed causality. Then later on, my career ended up being, doing a lot of CLM or database marketing kind of work, which requires a lot of A-B testing, R-C-G, and which obviously was very scientific. Then a couple of decades ago, I was asked to not only measure direct mail, email, that kind of thing, those are easy, because most large corporations would go through A-B testing or R-C-T. But how do you measure the success of a seminar? Let's say a large company sends executives or leaders to give talks to customers, and it's picking away from their day job. So they actually asked, how do you know my seminars are effective? Now that's difficult because when people on the street see the sign at a hotel, there's a seminar here, from so and so forth, they might come in or not. So you cannot block them from going in, hey, you're in a randomized control group, we cannot let you in. So that's not gonna work. So that's the first time I initiated to more like a costal inference approach, more scientific approach, that is to minimize R-C-T. The approach at the time I used was mostly Rubin's causal model, which was largely propensity, small matching. And later on, I discovered difference in difference and some other regression discontinuity design and so on. So those are also subsequent methods. And later on, I learned about Perl's causal approach, which is also highly used at least in my job now and in the past couple of decades. So especially when you don't know which one may be causing which one. In a Rubin causal approach, you need to understand which one might be causing which one and then you need to control it upon founders. The Perl's approach, you may not even know which one may be causing which one. So not only you can identify effects of course courses, but you may be able to also identify courses of effects. So using some kind of structural learning or causal discovery approach. So all these have been happening in the business world, at least personally, and I've seen it applied in some other calculations, mostly large calculation. Yeah, that's great. If I can summarize all the ideas, Victor, and then Patrick, you have discussed since there are two branches of the methods other than A-B testing experimentation. One is causal inference. Second is recommendation that Patrick mentioned, which is causal by nature. I want to shift a bit to Thomas for next question because at least for recommendation, that's an optimization problem. And then you are leading the AI products for optimization process. So how would you see these problems? Like how would you see the connection between causal as part of recommendation? So first of all, I'm interested in that most of the panelists here are into consumer B2C and markets and my concern, my focus is much more on internal processes. Like you have 5,000 bank clerks and you are trying to make sense of what they're doing and how they're doing it. And that's my use of causal science is trying to reconstruct the workflows from observations. So I think that to answer your first question is what are the causal science capabilities other than experimentation and A-B testing? That's typically, I'm not involved in A-B testing, I'm involved in that kind of things. The second aspect about the question that could be answered by causal science in my particular process, which is, and this is something that you did point out earlier is the generation of explanations. Because when we have 5,000 bank clerks and something like 100 or 200 processes that these bank clerks can be assigned like under this overdraft policy, this overdraft alert, under this fraud alert, et cetera, how do you make sure that all of them are at the same level or have some kind of common background and common ground to build on? And typically in nowadays, hybrid decision systems, which involve both some kind of algorithm that will orient or make a pre-decision and a human that will validate the decision or not, you have to use causal science to be able to not to confront the end user with a pre-digested result that they just have to click on and that will generate automation bias or conversely produce such recommendation that they're so surprising and so absurd that you will lose trust of your customer in the recommender system. And that's, I believe, where causal science can be very helpful and that's typically not at all the type of use cases that the other panelists address. Thank you, Thomas. This is very interesting. You brought the process type of problems to this panel because I think that one thing we've missed quite often in the causal science studies is the process, different process and how to optimize the process. I'm trying to connect the dots here. So Patrick, you mentioned several methods to inform product optimization and the Victor, you mentioned especially that the cause of structure learning from 2D Pro which requires some patient knowledge of the past and Thomas, you mentioned how the process optimization works among different processes. And going back to the definition I initially introduced sounds like we need to learn some sort of causal knowledge to start with in order to utilize many of the causal methods that we mentioned especially utilize that into recommendations eventually optimizations. So how the question, a natural question is how do we translate? First, how do we learn such knowledge? And second, how do we translate such past such causal knowledge about the past into recommendations? If you can give some examples of end-to-end they'll be perfect. Should we start with Patrick? Yeah, sure. So I just want to actually clarify when you say learning such knowledge do you mean the knowledge about these methods or the knowledge about the like a business? About the business, about the world about the causal structure of the world. Okay, cool. Yeah, how do we get the knowledge about the causal structure? I lean heavily on colleagues in product to understand the business process a lot better than me and just interviewing people. Like it's a really nice advantage we have with learning how business processes, what business processes are as opposed to say in academia where you have the world out there is that we can sort of talk to everyone who is involved, given enough time talk to everyone involved in making the decision, right? Because all the decisions are made internally we have all of the code so we can chase down the code we can learn how things are made, all decisions are made. And so that makes it, that's a real advantage of doing sort of observational causal inference in this, in industry. And so thinking about an example where we've done in the past is so mentioned before about metrics and trying to get better estimates of proxies that other teams can use to direct their work and to measure their success. We have one product, we have many products one important product that we wanna measure the incremental value of when customers make a certain decision. And now this is really, really hard customers we wanna know the profitability of the incremental profitability of this decision. But that's actually choosing to take these decisions correlated with profitability. So we had some strong economists look at this and get this value. And one thing we did was so you can adjust for a lot of confounders here and that works or you can adjust for one confounder and that gives you like 90 it's like the numbers are really, really similar. And the benefit is that it's really easy to explain internally to people who don't understand these methods. So you're actually coaching them and so they know that correlation is not causation but now we can show them exactly how this was working numerically and that's really useful. And then once we have this estimate we can provide this guideline about confounding to other teams much in the way that Dream 11 was proposing yesterday. And other teams can use this metric in their decision so they know if they get customers to sign up to this product they can actually estimate the value of that sign up and then they're able to use that to measure their success over the year. That's great. And especially I like the confounding part and how to explain confounders to people not in the area. So Victor, I know you have a lot of experiences juggling between the main experts and then the statisticians or technical people and what is your experience in building the knowledge of the past? Especially the cause of knowledge. Yeah, I think there are three approaches and they are, you can use the combination of those three. The first one is basically common sense and we all have some common sense and we know that let's say you have variables like age and gender, let's say. And you know that gender doesn't change age, age just increments every year, nor the other way around. And if your company is doing some RCT that kind of thing, you know that gender and age also don't affect your company's launch of that campaign. So there are some common sense already you already know quite often. And the second is the domain expertise, the domain knowledge. And that really relies on your expertise and your experience in this industry. Third, if you don't have the first two or even you have some of those first two but you only have possible knowledge, you still can rely on structural learning. And there are many structural learning algorithms that allow you to infer the cause of that. And there's no guarantee that it's highly accurate but it's at least giving you some ideas which one may be causing which one. So it's communism of those three in general. Great, great, great idea. And Thomas, I know your job is not limited to causal models but with my limited understanding of optimization you need to have some model to start with. So what is your experience in terms of building a model of the past? Indeed, I don't really, it's fairly new for us to venture into the basically Bayesian networks because my most of my experience is in prescriptive analytics and the way I did, I present it usually is I like to say that most of machine learning that we know of is descriptive intelligence in the sense it's about describing the world from induction, whereas prescriptive intelligence could be more described as deductive intelligence. That is to say we, from facts that we observe, we deduct how it should be and causal science should be there. It's very natural as you presented but let's face it, it's still fairly new. And one of the reason I believe that it's yet in its inception is that it's actually very hard to gather all the stakeholders of a business process or when a business decision has to be made and to place them at the same level. If you introduce them to a graphical model they're just going to say, please summarize me or please make it simple. And I'm sorry, but it's the complexity of things. And vice versa, if the data scientist comes and say this is the rock curve and this is where we should set or cut point for the classifier that we did. And then the guy says, but my workforce of 10,000 bank clerks is completely overwhelmed by the place where you have way too many false positives. So how do we do that? And what matters in order to optimize a business process is to bring these people together and have them each express their constraints in their own language so that we can have in the middle so that they can come up, each can take the constraints of the other into account and they cannot come to something. And this is what we're trying to do. And I think that this is a key to the adoption, the future adoption of causal science in general. Trying to make things, trying to develop tools. And I'm not talking just graphical models because this is not really understandable to make them understand. Thank you, Thomas. I think you have foreshadowed on my next question. And also it seems to be related to the definition I introduced in the beginning. To have prescriptive intelligence system, we need to understand the goals and we need to understand the constraints. And then those two seem to be part of the knowledge that we need to seek from human subjective matter experts. So the next question, so we shift to the second part. So how do we connect the dots from causal science to the future, the ideal prescriptive intelligence? So what is your, so I provided my personal definition but I'm open to challenges. So in your personal views, what would be your vision of the North Star of a prescriptive intelligence system for an enterprise? You can go back with Thomas and Victor and Patrick. That is to say me first? Yeah. Okay, so I think that as I hinted at the very beginning, prescriptive intelligence delivers optimal results and optimal when you provided the constraints and everything, but it's just not digestible by the decision makers. My perspective is from human factors. How do you make the decision makers capable of understanding a complex simplex program, linear programming or anything like that? And for me, this is my current use of Bayesian network is to try to, to define a very complex mathematical programs into the gist of those programs to make them digestible by high level executives who have to make a decision on what the 10,000 people workforce will do in the next months. How they will work. Great. Thank you, Thomas and Victor. Yeah. So early this year, I actually published a paper with a professor friend on a causal framework and a causal descriptive framework. Can I share my screen? Swing. Yeah. Okay. It's gonna be just really quick. So basically we talked about that already. What's our objectives first and then what are the desirable outcomes to be able to achieve and what are the decision variables that you can change with the constraints and what are the insights models data you have? And then you find a solution to solve this problem and through this process, you might have to deal with causal inference. So, and in our paper, we actually address situations when causal inference is required. And also situations when causal inference is not required because you already know the causality or maybe it's very straightforward. You don't need to apply causal inference techniques. So, but in many cases, you do need causal inference. And as a result, this is a combination of multiple techniques involved such as machine learning predictive analytics, optimization or constraint optimization as well as causal inference. And it requires multiple techniques. So we summarize many of these examples in our paper. Happy to disclose the paper. Yeah, if I can pause for a second, Victor, could you expand on, is this seem to me it's a bit unclear how do you connect the causal inference into optimization? Are you trying to build a model, structure model from causal inference and use the causal model for optimization? Just put things a little bit into the context, say you want to personalize things. How do we use like this framework? Right, right. Probably the most obvious example was when you're doing something like marketing and you may encounter a lot of possible treatments and a lot of customers. And how do you know which one to assign to which one, right? So, oops, to Yavor's point, earlier point, with LLMs, we might even end up with much more combinations of treatments. And someone will have to test those things on the back end and you'll almost have to run an experiment or do causal inference one way or the other. So you have to somehow infer the causal impact of each of these things. You know, it could be channel-specific, mastery-specific and also individual-specific. And then with that, you can optimize it if you know all that information. There's a loss of treatment combinations and you will need some kind of constraint optimization to handle it. And that problem actually has been around since then, at least the early 90s. And as far as I know, from the credit card industry first. So I was, in the 90s, I was a consultant, I was a management consultant for some credit card companies. They were already doing heavy, heavy experimental design, analytics, analytical modeling as well as optimization in this area. So it's more, I think we will be doing more. This is just one example. There are many, many examples that can apply the framework that I just mentioned. Gotcha. It can be used to reduce the variance that could potentially be the treatments. Yeah, optimize the treatment for each person. Basically, that would be it. Gotcha. That's just one example of the framework. Great, great. And Patrick, what do you think? I think that actually, that'll by very good answers. What I would add to them is that, so we're talking about like making the right decisions and using causal science to do that. What I would have in my North Star is just one extra layer on top or component at the end where it's like, what are the value of those decisions and how can we feed that back in to the whole process? So the teams know where they sit in the org and they know how they're contributing to the org and the company's long-term vision. So I think this will help keep everyone aligned and make sure that if our proxies do start to deviate with distribution shift and what have you, we can come back and realign and make sure that we're still making the right decisions. Great, great. I think one key challenge to reach to the North Star would be how to engage key stakeholders to align their goals and constraints, as Thomas suggested earlier. So what are your thoughts about how we should engage key stakeholders? Where do you see the frictions or challenges down the road? Victor, do you wanna start with this one? Yeah, I think of two areas that could be more than two. One is obviously awareness or visibility of costality and optimization, especially costality, both for the business stakeholders as well as within the academic community. And I said it because even though cost of influence has been around for probably closer to a hundred years, probably started with people like Jersey Naiman and some famous economists almost a hundred years ago. But since today, most data scientists have a CS major. It is not yet in the mainstream of CS study or even AI study, cost of influence. And Judea Perot has been trying really hard to promote that in the last 30 something years, maybe 40 something years, but it's still not in the mainstream. I think it's still more room to grow. And obviously people who have training in medical sciences and social sciences have been already doing it for a long time, whether to do it scientifically or in an approximate way. So if we can hire more, basically people, hire more people, data scientists who understand costality better, that would be one way. So awareness and talents. The second point is, how do we find talents who understand all these things? So it's because it's really very multidisciplinary. I mentioned that if you want people who understand machine learning predictive monitoring, many of them, you want people who understand experimental design or cost of influence is left, especially cost of influence is left. You also want them to understand constraint optimization, not just doing it in a mathematical programming ways. Quite often you need to apply heuristics. How do you apply heuristics in the right way? So understanding all three of them may not be easy. Of course, you can hire talents from different fields and have them work together. So those are the two thoughts, two points I wanted to make. Yeah, that's interesting. I never thought there's a gap in computer science in terms of causal. Causal science, but once you set it high, I realized, yes, that might be a potential friction area. I want to turn over to Thomas, because as you mentioned, you work on bringing humans and algorithms together and try to engage both into the process. And this is a question you raised earlier. So the difficulty of engaging different stakeholders and goals and the constraints. So how do you work on that? So, well, I think it's fairly interesting that the domain of prescriptive analytics has evolved, you know, it started with linear programming and raw optimization. And over the years, between the 80s and the 90s, started to emerge a new concern which was robustness. So there was a stochastic optimization. So the goal was not just to find the optimal solution according to how it was defined. It was also to ensure that if the parameters have changed a little bit, the solution does not degrade too much. And since I think the beginnings of the 2010s, the new challenge becomes a simplicity. So whenever an operation researcher is tasked with devising a solution for some reason, the old school one will just search for the optimal solution. The new school one will say, okay, but is my solution robust? And what I see in the newer generation is can I explain the solution to my boss? And this is how the people win the stuff. I can share a slide. I don't want to make promotion, but I mean it's just an example. So that's just... Feel free to do some shameless self-promotion. And I'm not going to talk about the product, but I'm going to tell you how much time... Do you have two minutes? Yeah, we have a couple of minutes. Okay, so because I don't want to... So the whole product line I'm working on started with the project we did with the bank. Just one small thing that you need to know is that if you... In Quebec, there is a delicious gravy called poutine. And if you actually try to order some poutine from Quebec to Europe and you unfortunately say in the comments, this is for poutine, your transaction will take a little bit more time and the reason is that it could be taken as... Is this for Vladimir poutine? Of course Vladimir poutine is under financial sanction, so there will need to be a review. This is an innocuous problem for you, but for a bank it's a huge problem because you have for 1.5 million transactions per day, you have 50,000 alerts that are raised. And if you make a mistake, that is to say Vladimir poutine manages to order poutine in Quebec, the fines range in the billions of dollars. So it's a very serious problem. And in order to address it, you have basically the inverse of artificial intelligence, you have people who do the job of machines and the first layer will filter those alerts, the obvious false positive, a second layer of filters will study the cases a little bit more, and finally in the end you have only 0.01% of fraud because Vladimir poutine knows better than to say his name when he wants to order poutine. The obvious, it seems very obvious that we should use machine learning to do that, to supplement the... It seems the obvious type of applications for machine learning. And indeed this is how the project started and we came up with a lot of solutions which we thought were acceptable. Now what happened is that the project was rejected, not because we think it didn't work, but because we could not solve the question who is accountable for those 30%. Is it the data science team or is it the process owner? And this is the product line that we're doing is attempting to answer the questions who in a decision process should design, how do you arrange between the algorithm with a recommendation? How do we take into account the possible automation biases? And in order to be able to always answer who answers the decision being made and most importantly, do they have the means to accept this responsibility? I'm not going to take much more time, I'm providing a development environment where you can test the hypothesis and answer these type of questions in an interactive way so that everybody can work together on the performance models, the cost matrix and come together to view how to make the best of humans and machines. Thank you Thomas. I have to admit, I probably need another one hour full session to understand what this is really about. We have one last minute to close this session. We could have time for maybe just one quick question. This one question on how is optimization an iterative process over different counterfactual scenarios? Who would like to briefly answer this question? Can you just clarify what you mean by that, sorry? So the question is, so counterfactual scenario, I think that's the method behind the cause inference, right? And then do we see optimization as an iteration over running different counterfactual scenarios? Yeah, I can have a little comment on that. So we have in some of our work, we have a large forecast being fed into an optimization problem. And we want to improve this optimizer and we've been working on that over the last year. And one question is, does it work better? Does the new version work better than the old version? And you can run AV tests and we've done that. But in the during development, we wanted to run simulations and do counterfactuals. And that really, we could do some, but we were limited because we weren't super confident about the degree to which we're adjusting for all the necessary, all the confounding in our forecasting problem. And so I can see that rather than going optimization on top of counterfactuals, I see it the other way around. Many businesses have really good understanding of how to do optimization. If we can do, but if we can get better inputs that are amenable to counterfactuals, then we've got something really, really powerful, especially for optimization model development. Gotcha. Thank you, Patrick. I think we had time. Thank you very much for all the panelists and also for Scott and other panelists in the first part. And thank you, Paul, Germain and others for organizing this conference. Please join me to give another round of a bigger process to everyone. Thank you so much. Thank you very much, Victor. Thanks to all the panelists and to Scott as well for hosting this.