 Hello, everyone, and welcome to our next EDW session called AI Governance, Driving Compliance, Efficiency, and Outcomes with RBC Bank, which will be presented today by Zayn Israela, Senior Manager of AI Research and Data Science at RBC Bank, and Michael Heind, Distinguished Research Leader at IBM. All audience members are muted during these sessions, so please submit your questions in the Q&A window on the right side of the screen, and our speaker will respond to as many questions as possible at the end of the talk. Please note that there is a linked form at the bottom of the page titled EDW Conference Session Survey. This is where you can submit feedback, and we encourage you to do so. Also, there is a small icon to the lower right of the screen, which will enlarge this window with the speaker and slides. So let's begin our presentation now. Thank you and welcome Zayn and Michael. Thank you, Eric. So hi, my name is Michael Heind. I work for IBM Research, and I'm in a department that's focusing on trusted AI with the goal of how can we make AI trusted, responsible, and so on. It's my pleasure to be with Zayn as ruler, and Zayn, you want to introduce yourself? Sure. Hi, everyone. I'm Zayn Israela. I work at RBC, and I'm a senior manager in a model validation team. Our team's basically responsible for testing and assessing models before they go into production to make sure they're safe for use for everyone at the bank. And I should probably mention I'm here to share my personal perspective, and I'm not officially speaking on behalf of RBC. Great. So thanks, Zayn. So today's topic is AI governance, and I just want to go through what we have in store for you today. First, the first half of today's session, we're going to answer some basic questions just to help define things, because AI governance means different things to different people. So we'll go through that, talk about things, why it's important, and so on. And then the second half, we have some discussion topics, some issues to discuss, and then finally, we'll have time for Q&A from the audience. So think about what questions you want to ask, because we're definitely going to have time at the end. So to kick us off, Mike, why don't you tell us or define what is AI governance for the audience? Sure. So AI governance has a few different meetings, and what we're talking about here is we're talking about governance, which implies some level of control, hopefully for good reasons. So you want to have the ability to control how your AI is developed, deployed, and monitored to ensure you have good outcomes and not have bad outcomes. And in particular, you want to minimize the risk of those bad outcomes, because you can't always necessarily prove that certain things will or will not happen. So with that definition, let's get to the next question, which is, Zayn, why is this important? You kind of really hit the nail in the head when you said the word risk, because from my perspective, kind of the whole purpose of governance is to facilitate the risk management portion of an organization. And there are many different types of risk. You've got your basic business continuity risks. If someone developed a model, and then they're no longer in your organization six months later, you don't have the documentation, you now have a key person risk, you kind of lost everything about that model, and now you've got all these problems. But there's not just business continuity, there are many different types of risks. There's the risk of financial loss, for example, if we lend someone money, and then they write off that's a financial loss for the bank, there's the risk of reputational damage. If we develop a model and we put it into production and it's unfair, there's reputational risk over there. There's also the risk of legal action. If a model is supposed to be doing something, and it doesn't do that thing in the right amount of time, we might be violating some sort of a service license agreement or something like that. There's also the risk of security and cyber stuff. So if there's a malicious attack at your organization, and then there's the risk of failing to comply with regulations, for example, the strict regulations that exist around the money laundering. So there are many different types of risk. And I think governance is kind of the central point to controlling and facilitating the management of that risk. So assuming that a team does develop a good governance system for their organization, can you tell me a bit about what are the benefits that that team can expect to observe having put the system into production? Sure. So the idea is you would have some central authority, maybe a chief risk officer or maybe even a chief AI officer who is defining some kind of policies for the organization. And by policies, I don't mean things like what people call often principles like AI should not be biased. Clearly, that's a good thing. By policies, I mean actually things that are actionable, that are executable. So for example, if we use the example of bias, we would say we should measure bias at a certain point during the development of the models, let's say after the data scientist thinks they have a model that's that's worth worthy to be deployed. And we should measure it bias in a certain way that turns out right now, there's many different ways you can measure bias. And we should also make sure that the bias that's measured, whatever value we get back, let's make sure that it's what we want. So for example, if I have a bias measure where someone says zero is fair, and plus one is biased in one direction to one group and minus one is biased in another direction. What happens if I get a value of 0.01? Okay, so it's really close to zero, but not so it's not perfectly fair, but very, very close. Is that okay? And it's not up to the data science to decide this. This is someone this is going to decision that needs to be decided by the the owner of the product, probably legal needs to be involved, maybe a chief risk officer, and so on. So that would be specified in a policy, it'd be very concrete, it'd be something that's actionable, with a clear decision on whether or not you know what's acceptable and what's not. And the reason for this is clearly you don't want to have a product that's in this particular case that's biased, right. As Zane said, there's many reasons why you don't want to do that there could be regulations like the EU just announced today, or it could be your brand reputation, and so on. So you want to minimize that risk, and you want to have that control. You could take a step back and say this idea sounds familiar, you know, if I if you've worked in software development, there was this trend maybe a few decades ago, where people were saying, hey, I'd like to control my software development, because I want to reduce the bugs that I have. All right, so there's a sort of similar analogy there, it's not it's not exactly the same, but similar. And then finally, another benefit you can get out of this is once you have this level of control, you also have the information that you gathered during this process. So you know, for example, which models were developed in a good way, they didn't necessarily fail any policy checks. And maybe there's something you can learn from that, there's a sort of an analytics, a meta level analytics that you can do on top of just checking that a model passes your policy. So with those benefits in mind, let's see, we're going to talk now about stakeholders. Zane, do you want to take it? I think you can take it because you already mentioned a lot of people that are very important in this process, maybe I can provide some perspective that, you know, someone doesn't often think about the model users as being kind of key stakeholders, the end users, people who are looking at the outputs of the model, because they're typically not consulted in the model development process. You have a business need, you have a data science team, they put together a solution. But you really do have to understand how these end users are also going to interact with what the model is outputting. Like how does the information even come to them? Is it through a software that in a report, understanding how they view this kind of ties back to your points about bias, because you can influence their decisions with model outputs, you have to be careful about this stuff. And you have to really think about everything in the model development process. I mentioned model users, but there are actually a lot more stakeholders than just the model users and the developers. You want to expand on that a bit? Sure, sure, sure. So there's a few ways of looking at this in terms of stakeholders, right? So stakeholders, what we mean by that is people that maybe have to do something or can benefit from this governance. So who has to do something? Well, if you think about how a model is developed, it's not just a data scientist, right? It probably starts with some business user who says, hey, I've got an idea, I have a need where I'd like to improve my overall business performance, or I want to reduce my overhead or improve efficiency. And so they request a model from a data scientist to be built. So even that business user has some information that would be valuable to know whether or not this model should be reused in a different situation. So they're one stakeholder. Another stakeholder, of course, is the data scientist or a team of data scientists who build the model or models. And then you next you have your validator, something Zane knows very well who's looking at the model and trying to make sure risk is minimized. Finally, you get deployed if you're lucky, and then you have a person like an operations person who's monitoring the model and making sure it's performing well. Those are all personas that are part of the life cycle. They all have certainly a major role in making all this work. But as Zane was saying, there's actually others as well that are important. So you may have, for example, we mentioned the chief risk officer or the legal department or the executive in charge of the brand, all folks within the enterprise that didn't necessarily touch the model or build the model, but they play an important role. Then you have regulators who will come in and make sure the model is actually produced in a fair way. They're certainly important. And then probably the most important people are the impacted customers. So this can be if it's a loan approval model, the person who was denied a loan or approved a loan, or maybe it's the loan officer, which is a customer but not the affected customer. All these people will like to benefit from having this governance and having some level of information and transparency on what was happened when you built the model. So moving to the next statement, the next topic, Zane, since you live this, this is your day job, can you share how things are done today? Yeah, it all starts with the need, like you mentioned, somewhere in an organization, there's a business need to solve some sort of problem. Once you've identified the business need, you need to engage the development team. That can sometimes be an external partner who's internal to the organization, but in a different line of business, or it could be a third party, could be a vendor, they come in, you typically describe the problem to them, they tell you what data sources they need and what information they require, and then they go off and then develop a model. Once they're done developing the model, that's where our team comes in. There's an inventory management system within RBC. The model developer submits their model through that inventory management system, and then it arrives in RQ. We pick it up from over there, and then we typically schedule a meeting with the model developer trying to understand a bit better, what is this model, what is it all about, and the developer, when they're submitting their model, they have to create model documentation, and this documentation follows a standardized format. It's a very lengthy document, very, very comprehensive. They have to write down everything they did and why they did it, who are the key stakeholders involved, kind of the end-to-end of the entire model development process, and the reason they have to write all that stuff down is because our team, the validation team, is going to read that document, we're going to talk to the development team, and our job is to assess the model end-to-end. So we're looking at, is the model appropriate for the business problem that they were trying to solve? Does it make sense for the development team to use machine learning over here? Which algorithms that they use, are those algorithms conceptually sound? Does it make sense to use algorithms in this context? What are the assumptions of those algorithms and the limitations? Where's the data coming from? What various sources is the appropriate data? What are the actual inputs into the model? Is there like a feature selection process? And how did that happen? What is the rationale? And we kind of drill down on every level of decisioning that starts from the beginning of the development process, right up to the end, and we try and look at how everything kind of fits together and combines. So once the model produces the output, how does it get to a user? What are the change controls that kind of govern the model being in production to make sure it can't be touched while it's out there? And how are the users actually using the model output? And how does that influence the business ROI at the end of the day? Like how is the model performing not just statistically in terms of accuracy or statistical measures performance or what have you, but also in terms of the business metrics. So can we actually say that this model is having some tangible benefit for the business? We try and quantify that. And then typically in the process of reviewing the model, we come up with a risk rating for that model. And the idea is this risk rating will tell you how important is this model. And typically the amount of time we want to spend on the model and the level of scrutiny we provide to the model is somewhat proportional to the perceived risk rating of that model. So if you feel like a particular use case is more important, we will spend more time on that use case. If the model is more uncertain, if it's a more complex model, we might want to spend more time with it. Once we're done our piece, so we've done like the end to end review, we write a report ourselves and we summarize kind of the main findings of our reports at the top of the report. And we describe them as issues. So things related to the model that the development team should go back and revisit, try and fix. Assuming there are no critical issues that would prevent the model from going into production, typically speaking, a model will have multiple issues when it does go into production. And the development team's job beyond that point is to go back and then fix those issues within a period of which we've agreed on with them. So if it's a very severe issue, but it's not super critical, we might say, hey, take the next month and fix this problem. You can put the model into production, but we need evidence that you fix this within a month. Something like that is what the interaction looks like. We present all of these findings to the heads of model risk and model governance. They go through review what we found, they approve it, then it goes for business approval to the executive team that is responsible for those models. So this is going back to your point about stakeholders, lots of different people are involved. People you don't normally think would be involved, people who have no direct impact on the development of the model are responsible for reviewing the validation report and making sure that the model is sound to go into production. Once all the approvals and everything is done, the model does actually go into production. Typically, there's a team which monitors the model, like you said, make sure that the model is performing, not just as we're expecting it to, but it continues to perform as we're expecting it to. So they monitor things like drift and so on and so forth. And during the validation process, we try not to just test the things that the model development team does. We don't want to just do the obvious things. We want to try and incorporate stuff from research. We try to do stuff like adversarial robustness, uncertainty quantification, fairness testing, model drift, adulation studies, just finding various ways where we can find vulnerabilities in the model. And after you've validated a number of models, you start to kind of see where these things will pop up. And when you get documentation, you can just read the documentation and you can get a sense of where those issues will likely be. And you can design your validation plan to kind of target those points so you can find insights which can help the model development teams improve their models and safeguard them from risk. And that's really what the intent or the purpose of our team is, is to provide an effective independent challenge to something someone in the bank has done. And that's kind of a high-level view of what the governance framework looks like right now. And it's a lot more complicated than I said because I'm giving a very high-level view. Each of these components takes a length of time. But some interesting insights I've taken away from the process is that machine learning models specifically or AI models, they're different from traditional models. They're still statistical models and they're still coming from the same source material so that part hasn't changed. But typically with machine learning models, you have more parameters and the way these models are implemented, they're looking more like software. I think you mentioned this earlier as well, that machine learning models are now implemented in the form of libraries and packages. And there's a lot more emphasis on code, whereas in the past a traditional financial model could be implemented in an Excel spreadsheet that's typically, that would never happen for a machine learning model. The actual implementation is a lot more sophisticated. There's also an increasing reliance and algorithmic techniques. So in the past people would spend a lot of time using their domain expertise to do stuff like feature selection or hyperparameter selection. Now there's hundreds of different techniques for AutoML, for feature selection, for hyperparameter tuning, using various vision methods and so on and so forth. So there's a lot more reliance on using algorithms for testing the model and making sure you have the best version of the model. And that's kind of some of the insight I've taken away from working with these models. And I think once we start talking about what some of the issues are, we can kind of deep dive from there. So just to add on to that, Zane, thanks for that. I didn't know a lot about this and in my conversations over the past few years with Zane, I'm just totally amazed at the amount of due diligence that goes on. On the surface, she would say, okay, a model validators job is to assess risk. So it's very serious and you would expect them to do a lot of things. But even whatever you have in mind, whatever that means, it's actually much more exhaustive and serious than you might think. And one example is, I should have mentioned this earlier, but I lead a project at IBM called the fact sheets project, which is trying to document, produce this kind of documentation and enable governance. And as a public website that I'll put in the chat later. But we gave you some examples of what we thought would be good sort of documentations of models to assess risk. And they were done by hand. For most Muslim were done by hand. And I had a session with Zane where he was kind enough to take a good amount of time to give us feedback on it. And I thought we were really, we overdid it. We provided way too much information and too much analysis that took, I think, several months to do it. And it turned out we didn't. Zane was very kind and had some positive things. But if anything, it was maybe you'd like to see a little bit more on those examples. So if you wanted to get a feel for something concrete that's public, you could check out the website and I'll put a link in shortly. We have some examples of the kinds of things that Zane was talking about. Yeah, that's actually a really good segue to jump to the next question. Maybe to the next slide as well, Mike, where one of the conversation points we wanted to talk about was documentation. Because like you said, documentation is kind of tough. In the development process, you're making hundreds of decisions. And in that process, it's really hard to describe everything you're doing, and then also be able to rationalize it. So it's not enough just to say, you know, I use technique X, you have to be able to justify with why technique X, right? Why does it make sense to use this particular technique? So, you know, one of the ways you could provide evidence for that is you could benchmark, you know, a number of different ideas, and then say, you know, this is the one I set it on, because it performed the best, or it met some computational or some business constraint that I had. But you could also just cite, you know, a couple of papers and do literature review and say, I went through these things, and I think that this technique is the one I want to use from my particular problem, because it's the most applicable based on the conceptual soundness and the assumptions behind the technique. And we accept both. Our goal in the validation process is not to say, you know, you're doing this wrong, you know, you have to do stuff in one particular way. What we're looking for really is due diligence from the model developer side. We want to see that they've put in the time in the effort to kind of think about the problem. And they're writing down that rationale, you know, wherever they can. And when I was giving you some of the feedback on the fact sheet initiative, you put a lot of great content there, but there were there were some points, particularly where I was able to ask you, this is great, but could you expand on, you know, this this particular point that you provided this number over here, but you haven't really explained whether this is a good or a bad outcome for this particular case. And that's also the type of commentary we're looking for developers. It's not enough just just to, you know, put a plot and then say my model performance is 85%. You have to kind of comment is 85% good is that bad is that how does that fit into the domain of solutions that better than a prior baseline? Are you performing worse or you're on par? Because if you're on par with like a rules based or heuristic process, and your model is not really improving anything, then why should your model go into production? So you need to try and ask those questions and backtrack and think, you know, why are we doing certain things? And we're looking for evidence predominantly in the documentation. But because we're looking for evidence is a lot to write. And that's why I liked your fact sheet initiative a lot because the entire intent behind that was to try and consolidate the most important parts of the model documentation into a standardized simplified format. And that's an initiative that's something we're looking into and doing every day RBC to is figuring out how do we improve our processes so that when our model developers write their documentation, it's easier for them. Great, great. And I guess if those of you who are parents or mentor young children, those who want to go into data science because they don't like reading and writing, they may have to rethink that because some of these documents can be over 100 pages long. Yeah, we've also got a note on their documentation in various industries. We had a conversation about this last week where you asked me a really, a really good question around different industries picking up documentation adopting a governance framework. Do you want to just set up the context for that conversation? Yes, sure. So certainly when we talk to companies in the financial industry who have regulations, you can see the need for measuring and mitigating risk and reducing risk. And that's sort of a reasonable assumption that if you wanted to go find such organizations, it would be in financial regulated spaces. But I'm starting to have conversations with others that aren't regulated. And they're looking at this question, same kind of thing of putting in a pretty significant validation step like we see at RBC. So my question to Zane was, do you think this is just a one off because the person happens to be a former financial person? Or as you see this as a trend where other industries will be picking up this methodology? Yeah, and I would hope it would be a trend because I think the idea is really sound. No matter where you go, it's a good idea to document what you're doing. It sounds very bizarre to me that there are organizations out there that would put models into production and then not have the correct documentation. Maybe they wouldn't go as far as we go with respect to testing everything and making sure we're complying with all the relevant regulations. But just having a good governance framework is so critically important. And it's surprising to me that it's not done more in industry because of how valuable I've seen it to be within RBC. And I've worked with a lot of teams that were very apprehensive about the validation process their first time doing it because there's a perception that it would slow them down or their timelines would be impacted and so on and so forth. But after working with us, they found the process to be immensely useful because we're actually able to give them very valuable feedback because we work with every team in the bank, right? So we get perspective from many different places. If there's a cybersecurity technique which is working really well there, why not apply that to anti-money laundering or fraud? That's the sort of connection that our team can make. We can see a technique used in one domain and then we can tell a team, hey, we saw this being used over there. It was really effective. This is what they did. There's documentation for what they did as well if you want to read that documentation. Or here's the developer's name. If you want to go chat with them, go talk to them because you're working on very similar problems. There's probably value in these two teams collaborating. That's one of the positive outcomes that can come out of a good governance framework because you're able to make these connections because things are written down and people know how to contact each other. I think that's really important because right now, maybe what organizations lack is a good platform for AI governance because RBC, we have a solution in place. We've had one for a long time. We're extending it now and making it more compatible with machine learning models. Sure, but we've had a process in place, but not every organization has that. I know IBM has done some work in developing those platforms out. Do you want to tell us a bit about some of the work your team is doing, Mike, on AI governance? Sure. We started on this journey looking at specific, I would say, we call them pillars of trust, different aspects. We mentioned them already previously. One of them, of course, is bias or fairness. The question is, how can you detect it, how can you mitigate it, and so on. We have an open-source toolbox in the space called AI Fairness 360. We also have a product in the space called AI Open Scale. Then there are other pillars as well. There's a question of explainability. Can you explain a decision of a model? We talked about adversary robustness. There's uncertainty quantification. All of these, we have actually a lot of deep technology, some of which is open source, some of which is in products. The macro level question that we've been talking about is, okay, great, I have these individual ways of improving the model for some measure of trust or accuracy. How do I govern the whole thing? That's where we started with the fact sheets project, which was to first create a transparent documentation that we've been talking about, and then it led into a natural pull from customers. We've spoken to on the order of three dozen different customers from various industries who are saying what we've been talking about so far, which is, I want to govern, I want to control, I want to specify policies, I want to bring stakeholders together. We use different tools and rally around one sort of view of the model and how good it is. What we've been doing is working with our product team, and last December we announced there'll be a product in this space on AI governance, in particular the fact sheets idea, that's going to be coming out later this year. Stay tuned for more details on that. If you go to the fact sheets website, there is a link to the product announcement about that. Going back into one of these pillars of trust, there's this topic that has a lot of letters in it. It's called uncertainty quantification. Another way of thinking about it from a layman's term is sort of confidence. The idea is, I have a model, we measured it's accurate, let's say 90%, but is there some way of quantifying how confident it is in its predictions, knowing whether or not when it says the answer is approve the loan, is it flipping a coin, is it 90%, 70%, whatever it is, in some sort of scientific way. Zane, I know you and I have had discussion recently. Can you elaborate on the practicalities of this and the value it brings? Yeah, traditionally in the statistical domain, everything is based on distributions. You get confidence intervals and you get a level of confidence in everything you're doing. In machine learning, the emphasis kind of switched on serving predictions as opposed to providing scores or distributions. With that shift in what the delivery output of a model was, attention kind of shifted away from quantifying the uncertainty with those predictions. If you're just looking at like a 01, it doesn't really tell you anything. Let's say you're part of an investigative team and you're getting reports that some people are up to some suspicious activity and you have to go and investigate every single one of those people. Well, who's at the top of your list to investigate? If everyone is just assigned to 01, one being investigated, zero being don't investigate, then who do you prioritize in that list? So that's kind of the first question is figuring that out. So the way people try and solve that is that instead of giving the zero or the one, they give the model's output score. So the model output is not technically a zero or a one, it's actually a score and that score is thresholded by some criteria and then that converts it to a zero or a one. So they give that score. The problem with giving that score is that typically speaking, that score is not well calibrated. So you can't necessarily say that if the model says there's 0.99 that this person's up to no good, that in 99% of the cases, that's actually going to happen. It's just a score and this requires you to understand a little bit of the math kind of behind the scenes because typically speaking, models don't output nice numbers between zero and one. They output numbers that are unconstrained, they're called logits technically and in a lot of cases, those numbers get squashed into the range zero or one because people like numbers between zero and one because they look like probability scores. So it's very easy to kind of look at those numbers and say, oh yeah, 0.8, 80% that this is going to happen, I should probably prioritize this more than something else and that's kind of risky to go down that route because if you have three different models, one of them is saying 0.8, one of them saying 0.9 and one of them saying let's put say 0.85, the one that's saying 0.9 is not necessarily higher than the one that's saying 0.8 because the models are trained differently under different criteria and conditions. So you can't just look at the pure numbers and say I'm going to rank stuff by this. So because of this, there's this idea that you need to be able to quantify the amount of uncertainty because if you actually had a quantification of uncertainty, when your model says 0.9, you can get an interval around that 0.9. So if it says I'm pretty sure it's 0.9, like it's between 0.88 and 0.92, now you're a lot more confident because you can say, yeah, I should probably start with this one because this is actually high. If the model is saying it's 0.9, but it could also be 0.6, so the actual value ranges between 0.6 and 1, but the prediction was 0.9, then you probably don't want to start with that one because the model is pretty uncertain about that prediction. And we've seen in a number of use cases using these strategies and they're not very commonly used in machine learning. They're used very frequently in statistics, but not very commonly in machine learning. You can discover very interesting insights about your data and about your models because you can discover subpopulations within your data where the model is not performing well. And then you can ask yourself, okay, my model is very uncertain for this group of people. Why is that? And then if you go back in and you investigate, you can find that, oh, there was a problem somewhere over there. And this is the type of insight we're able to offer the teams we work with. And our intention is not that this only happens as part of the validation process, but hopefully model development teams take uncertainty into consideration when they're developing their models. And when they do that, the model users can get, as output, not just a prediction, a number saying the forecast is X or you should do A or B. It's getting a level of confidence with that prediction so that the users who end up using the model are able to make more informed decisions. Great, great, great. That's really useful. I think that one of the challenges here is people see models confidence. Like in the Jeopardy show, there was confidence when it made a decision, made a prediction. And that's not necessarily, as you said, statistically meaningful. And I think humans in general have trouble with this concept. We see this with polls, political elections where people don't even look at the plus or minus aspect of it. So it's really important. Yeah, and I know your team, Mike, is doing some work in developing tools around all of this stuff. And you mentioned last week an inspection report. And it's a really interesting concept because it's one of the things we're looking at too, but it's very challenging to practically do. So maybe you can walk us through what some of the work your team has done on that. Sure. So the idea is simple. And I think everyone, technical or not, has seen examples of this. So you're going to buy a new refrigerator and you go to your favorite site that has analysis of the refrigerators. Maybe it's consumer reports or some other place and they give you ratings, number from zero to 100. Or you're looking to buy a home and you like the home, you like the price, but you're bringing an expert in to do a detailed analysis of the home, the home inspector, of course. And so the question is, that scenario comes up also with AI models. It could be maybe you've got a model, you have an acquisition and they have a model, a bunch of models coming in and you want to get some feel for those models. How good are they? They're already built. You don't know if they were built the right way or not. And you want to have someone come in and do an assessment. Or maybe you're buying a model from a vendor and you want to have a similar thing. Or you'd like the vendor to provide that sort of third-party inspection. So based on all these sort of pillars of trust that we've been talking about, bias, uncertainty, robustness, testing, a bunch of other things, we've been looking at this topic of can we give this detailed inspection report, a home inspection report for models. That's sort of step one. And we're confident that we can do that. And then step two is one that's really tricky. And that is, okay, given we have all these details, some of the stuff that Zane was talking about with uncertainty, how can we make that consumable to maybe an executive or a buyer? So for example, let's say in the business of ensuring AI models, Zane has a company and he comes to me and he says, hey, Mike, I'd like you to insure my model. I'm an insurance company. I'd like to insure my model. And I say, sure, Zane, I'd love to do that. Here's what it's going to cost you. But before I can tell him the price, I need to sort of snel his model or inspect his model to get a feel for the level of risk. And really all I really need is a high, medium, low, because that's my buckets I have for pricing. So the question is this level of abstraction, given all this deep analysis that you can do on the model, how do you summarize that for somebody who maybe doesn't want all the details? They'd like to have a consumer reports sort of version. That's more of a human HCI kind of challenge in addition to a deep, deep technical challenge of assessing models. So at IBM, we're doing a lot of work in this space, nothing to say yet publicly about it. But we would love, of course, to talk to people who would like to engage and help us make sure we solve it the right way. So Michael and Zane, we have just about three minutes before we want to start our Q&A. So just a quick reminder to the audience, if you have questions for Michael and Zane on the topic, please drop those in the Q&A thread on the right side of your screen. Thanks. I'll wrap us up with one final point. The thing I like about the inspection report idea is that it kind of forces you into a framework where you're testing different aspects of the model. And I think that's really important. It's just not enough to run one or two tests and say my model is sound, it's good. Especially if it's a critical model, if it's a business critical one, it's an important model as a high materiality. If regulators will be looking at it, whatever the conditions are surrounding that model, if it's important, you should probably test it through many different means and you should look at it from different ways. Now, people tend to argue a bit about the terminology, whether something is transparent versus interpretable versus explainable. And we don't want to delve into the terminology argument. And we just highlight the point that it's important to test the model in many different ways so that you can establish trust in the model because you can trust it once you've done a lot of different things that help you establish that trust. The challenge with what Mike's trying to do, the hard part is that context is very important when you're interpreting the outputs of something like an inspection report because you can't always compare models apples to apples. If you have a normal supervised binary classification problem, that is one type of problem. Let's say that exists in the fraud domain. Over there, you have a unique consideration that your data is extremely imbalanced. So the type of metrics you see in the fraud domain, you're not going to see them in like the lending space or you're not going to see them in other areas where your data is not imbalanced. So because you have these unique considerations, you have to consider the domain, the way in which the model is being used and multiple aspects related to the technical nature of the model itself. All of these considerations go into how you analyze those outputs because if you don't consider all of that context, you're going to make a misleading judgment. And that's where the challenge really comes in, is how do you present this information to people knowing that the underlying technical details of the things you're trying to present are different. So I think that's a really interesting challenge and I'm glad there are people working on it. And I'm curious to see where that goes in the future. Do you have any last comments, Mike? No, I just add to what you just said in terms of context. Not only is it the question of sort of when you're measuring things, what's the right way to measure for those particular dimensions. There's also a question of what dimensions are relevant in your particular use case. So some examples, you might say bias is always important, but if I have a model that's making, it's looking at defects in the manufacturing process, maybe bias is not that, I mean bias in terms of sort of human bias or bias against humans, maybe that's not as relevant there. Clearly machine learning is built on bias in terms of from an mathematical sense. So of course that's relevant. Another example would be something like adversarial robustness or hardening a model so that attacks can't go and understand its decision process or its training data and so on. Maybe if your model is deployed internally and not deployed externally, maybe you don't care about that. So just two simple examples to say context matters in many, many different ways. The good ones that Zane mentioned and also in the terms of what you even care about in terms of various dimensions. So maybe we'll, since I think it's time we've got about nine more minutes, we'll go to the questions from the audience. Thanks for everyone for listening and also for, we have a few questions already queued up. Thanks everyone. So Michael, if you see those questions there, you're welcome to go ahead and read them out. Otherwise I can pull them up for you. Which would you prefer? I think I see them. Great. So let's see. So the first one is from Indra Klein. Thank you. What about cultural nuances that may not be necessarily viewed as bias? That's a really good question. Maybe I start out, Mike. So we were talking about context and this is why context is important because in some domains, having a little bit of bias is not necessarily a problem. You can run your tests and you can observe that maybe your fraud model is biased against age, for example, because people who are older might not have as much education around cybersecurity and phishing attacks. So they're more susceptible to being phished and if they get phished, then they lose their credentials and then a malicious third party can use those credentials to make fraudulent transactions from their accounts. So there's a bias that is in the model. Maybe that's a good bias. Maybe the model should be more strict when it knows it's dealing with an individual who might be older and is more susceptible to a certain type of attack. Another person could make an argument that, hey, the model should be completely unbiased against age because whether you're older or younger, we should have the same rates for detecting fraud across all groups. Our model should be equally good for everyone. So there's a conversation that needs to happen over there and we need to involve the business stakeholders and they're the ones that need to advise us on, does it make sense that the model is biased in this way? And step one is you need to do the testing because if you don't do the testing, you don't know whether that activity is actually happening. So first you do the testing, you identify an interesting observation and then you can take that back to the business and say, hey, we found this thing about your model. Does this make sense to you or do we need to do something about it? And that's how you kind of start that discussion. Yeah, I'll just add in another example. I'm not sure if you would label it as cultural nuances, but certainly it was something that was in the press. And this was a major university on the west coast of the US was trying to determine who should get COVID vaccines first. And they actually used an AI model to do that. And on the surface it looked like they were doing something reasonable. They were prioritizing older people, more vulnerable people. But it turned out it was a university had a medical school. It turned out it was prioritizing the sort of administrators who weren't actually seeing a lot of patients over younger, healthier people who were seeing a lot of patients because they didn't necessarily get the features right there. And of course there was a major uproar in the press and so on. So if you haven't figured it out already, this topic is a very challenging one to get right. Often people talk about the need to have diverse viewpoints at the table to be able to spot as many of these biases as possible. And hopefully that could have helped prevent this particular incident. Okay, next question. Let me ask them a different person. This is from Sophie Ann Sadeen. First and foremost, since governance is a set of actions, it cannot be reserved to the sea level or management levels. Regarding IA, the problem resides with ethics and data quality where IA becomes a vicious circle which feeds from data to curate data. For ethical reasons, I believe the right name for IA should be AA, advanced automation where the policies, the rules and the data are clear. What are your thoughts? So I definitely agree on the need for clarity. One of the things that if you look over the last few years in this general space of raining in AI or controlling AI or getting better governance of AI, there's been this evolution where initially a lot of companies and a lot of countries were coming out with their principles on AI. These were sort of high level lofty goals such as AI should not be biased, AI should not do bad things to harm society, and so on, which is I think a reasonable thing to start with. But as a technologist who likes to get his hands dirty, I was a little frustrated by those things because I wanted to actually implement something. I wanted to know when I succeeded. So what do you mean by it should not be biased, for example? And it's a similar thing with explainability where there is actually regulation in various places that say AI should explain itself. And then the question is, well, what does that mean? If I give you an explanation, can you tell me if I satisfied that or not? So I think, and this is sort of a natural evolution, I think we'll hopefully see more maturity where it'll be more clear what these various policies or wishes should be, and that's going to help a lot to be able to implement things, to measure things, and know if you're complying or not. Zane, you want anything on that? Yeah, I'll just add that the reason our team exists is to be an independent third party that can come in and look at what a development team has done and hopefully prevent them from falling into that vicious circle. You mentioned where you can have risky situations where you have a model feeding into a model feeding into a model. And if anything breaks in that in that pipeline, then you know, everything is kind of going down. So we want to avoid situations where that happens or in situations where it may happen to the potential for that exists. There need to be policies and practices in place saying that, hey, there's an upstream model to this model. If that model were to go down, here's what our team would do. So they have to actually outline what the process would be in that situation. And we want to see that from developers when there are these unique considerations that come into play with them doing something that's kind of out of the box or where something gets a bit murky maybe, we want to see them expand and explain what they would actually do if they encountered a practical problem. So let's say your model performance kind of falls off a cliff because you're using data which is stale or you developed a model which just wasn't stable over time. We want to know that the development teams are equipped to handle that situation. So that's when we assess their monitoring framework. We try and see what metrics are they looking at. What are the exact thresholds they've put on those metrics? How are they monitoring their model? Like what sample sizes are they taking? What frequency at which they're monitoring their models? And what do they actually do when something happens? Like what is the actual process in place? Who gets involved? Just to make sure because our team looks from the perspective of client first. So we always consider the end users. How does this impact RBC clients? And then we kind of backtrack from there. We consider the other things as being secondary to the client first. So if there's a client impact over there, we tend to prioritize that very heavily. And we try and hope the internal teams accountable to that standard as well. Great, great. So Eric, Zane and Michael, thank you so much for this great presentation. Thanks to the attendees for tuning in. We are at time. Please complete your conference session survey on the page for this session. The next sessions will start in about 10 minutes. So we'll see you over there. Thank you. Thank you. Thank you both.