 here and others like quantum that are on the horizon. And so we should not be just swept up in the hype of AI, understand what it is, and how Africa can contribute, how Africa can lead, and how Africa can leverage this technology to solve some problems. And I think one message that I would like to pass on so is let's go beyond the simple apps and the conference papers and so forth and really think about how can we have impact that scale, right? How can we develop this technology so that they actually solve problems at scale? And I think that's a goal of this seminar series to really understand what's next in AI, what can you do, right? So I hope all of you will learn a lot from it. I hope you will enjoy it. And with that, I would like to end. Thank you very much. Thank you, Dr. Solomon Assefa. All right, those are some very important reminders about the goals of why we study AI is to have large impact, impact across continents. And that's something that I think we're going to be talking about some today. I now want to transition into our first set of three speakers, which will be followed up by a Q&A session. So we're going to bring all three speakers back afterwards for a question and answer panel. So please use some of the tools available here on CrowdCast to share your questions. And we'll cover those at the end of three presentations. And these three presentations are going to be about different unique perspectives of trustworthy AI. The first one is Dr. Kush Barshani. Dr. Kush Barshani is a research scientist and manager at our Yorktown lab in New York. He brings two unique qualifications to this discussion. First is he's one of the lead, one of the primary leads in IBM research's goal in establishing better trustworthy AI. It's a very large task, and he's been driving that for years. Second, he's the co-founder and director of IBM Research's AI for Social Good program, and he's been doing that for about the last three or four years. So these are some cool, really unique points about Kush that help him give a unique perspective on the role of trustworthy AI. Kush, please go ahead. Great, thank you, Skyler, and it's my pleasure to be here. So yeah, I'll be giving an overview of trustworthy AI, which will be a lead into the rest of the presentations that you'll hear today. And a lot of what I'll talk about is actually captured in a book that I'm writing, Trust in Machine Learning. The URL is on the bottom of the page, and you're most welcome to look into that book. And if it's beyond your means, just send me a note, and we can figure out how to get you that content. So just to get us motivated, there's been a lot of newsworthy sort of articles, illustrating the fact that a lot of decision-making that's supported by machine learning can have unwanted bias. And that's in many different areas, whether it's face recognition, healthcare, lending, education, hiring, and criminal justice. But what you might notice with all of these examples is that they're very US and Western-centric. And this is kind of a problem, right? And there's actually many more examples that are a little bit less recognized when we didn't talk about this topic. So even looking at the education example with the A-level exams that were scored by an algorithm because they were canceled due to COVID, all the news is about this being in the UK. But as many of you know, these A-level exams are also used in Cameroon and Mauritius and many other places in the world. And so using these examples- Hello, I'm sorry, Stenner here real quick. Or do you have slides to share? If so, they're not being shown right now. So I just wanted to let you know that. I think you can... Oh, okay. So let me see what the problem is. And then that way, you'll know you'll sharing slides when it appears on the broadcast as well. Okay, my mistake. Sorry. I know, I interrupted your flow. There we are. I think I'm gonna see you pop up now. Great, and then switch to presenter view in your golden. Thank you. Great, thanks, Skyler. Yeah, this was the book, by the way, so that I'd mentioned at the beginning. So the URL is at the bottom. And yeah, these are the examples on face recognition, healthcare, lending, education, hiring, and criminal justice. Yeah, so as I was saying, this UK example with these A-level exams that were canceled due to COVID last year and then they were scored by an algorithm. It made a lot of news that this was unfair, but it was actually not even part of the news at all that these A-level exams are not just in the UK, but in these other countries as well. So that's, I think, something that gets missed when we're talking about AI bias. So there's in the so-called non-traditional fairness use cases that I think are more relevant in, for example, an African setting. So when there's a telecommunications provider that's rolling out infrastructure, when there's retailers who are kind of selecting people to do an extra check on, if there's forest managers who are making decisions about where to plant trees, that has a big effect on indigenous communities who rely on the forests for their livelihood. When there's delinquency collections being made, even recommendations in fantasy football, all of these are things where we have seen the use of bias mitigation and algorithmic fairness techniques and they get missed when we talk more about these Western sort of examples. And trustworthy AI is not just about bias. So there have been examples where Uber's self-driving cars have caused a fatality and that was a result of the systems not being properly calibrated. Or there was an example in Pittsburgh where there was an algorithm used to predict whether patients would reenter hospitals due to pneumonia and this particular hospital had the fact that people with asthma would be much less likely to be readmitted for pneumonia, which is counter-intuitive, but was true in this hospital because they had a special program for asthmatics, but it would not hold up if you were to apply it in any other hospital. So this could be a big pitfall as well. So what do we mean when we say trustworthy AI? And a good place to start this conversation is by asking what do we mean when we say a person is trustworthy? And let me begin with a specific example and ask you to think about this for a second. Do you trust Elliot Kipchogu to run fast and think about this for a second? I think most of you will say yes, but why would you say yes? So first reason is that he's very competent. So on October 12th, 2019, he ran a marathon in one hour, 59 minutes and 40 seconds. But this was done under very special conditions at, in Vienna with a perfect setup for him to do this really fast. Just as a side note, the picture on the left is my kids. What their face is painted with the 15940 at Carrera Forest in Nairobi, where we had gone for a watch party. So what else do you need besides him? Having one example of him running fast, right? So he's also been very reliable, right? Out of all of the marathons he's run, he's won them all except for a couple of them and run very fast each time. And these have been run under very different conditions. Some of them in cool conditions in Europe, in the Olympic games, in very hot weather in Brazil and in Japan, right? So he's been reliable at doing this as well. What else do we know about Kipchogu, right? So he's been very open about his motivations and how he approaches his running, right? So there's many different quotes out there from him and they explain what is it that he does and how he reaches his level of competence, okay? And he's also selfless. He doesn't just train by himself, he runs with others and brings all of his community up with him, right? And if we look at the organizational management literature, these are exactly the same four attributes that have been pointed out. For trustworthy people and for trustworthy AI. So this first attribute is competence. Can the person or the AI system do what it's supposed to do? The second attribute is reliability. So that competence sticks around in different conditions. The third attribute is some level of openness or intimacy or ability to communicate back and forth. And the fourth attribute is that selflessness that we talked about. So the motivation to serve others' interests as well as your own, right? And these map precisely to what we want out of AI systems, okay? So that first attribute is mapping to accuracy. We want machine learning systems that are accurate. And this is what everyone has been working on for many years, but it's these other attributes which are the new aspect of what we need to work on when we're aiming for trustworthiness. So the second attribute is distributional robustness, fairness and adversarial robustness. So wanting these systems to work well in different conditions and for all people, right? The third attribute maps to explainability, uncertainty, quantification, transparency and value alignment. So that the machines are telling us as people how they work and we as people can tell the machines how they should work and what behaviors we want from them. And the fourth attribute is about using AI for social good and social impact applications. But also not just that, but empowering all people around the world to be able to use AI to meet their own goals, okay? So when we put all of that together we have this very busy picture of all of the things that we want out of trustworthiness AI. So I'm not gonna read this in detail. I'm gonna just flash this a little bit but emphasize a few different things. So one is that we start with some basic capabilities, move up to pillars of trust like fairness, explainability, robustness. Then we move up to more transparency and testing sort of issues and put all of that together under this window umbrella of governance. But on the sides we also have other topics that are not the technical topics but other things that we also need to do when we're working on trustworthiness AI. So those include being precise with different application domains like healthcare and funding and so forth, making broader impacts by interacting with civil society and governments and focusing on the inclusion aspects as well. So how to get all people around the world to be able to use these technologies. So there's this quote by one of the early CEOs of IBM, Thomas J. Watson, senior, which I like quite a bit. So he said that the toughest thing about the power of trust is that it's very difficult to build and very easy to destroy. So we're not gonna talk about destroying the trust but what we're gonna talk about is how to build that in AI systems here. And there's just two words that if you get from this presentation, then I'll be happy and that's no shortcuts, okay? And let me explain what that means first by coming back to our friend, Elliot Kipchogin. So he has this quote that I always tell people that this is really simple deal, work hard. If you work hard, follow what's required and set your priorities right, then you can really perform without taking shortcuts, okay? So he's saying the same thing that I'm saying about not taking shortcuts. So first of all, we don't wanna be taking any shortcuts in inclusion, right? So as Solomon said, IBM research has had our lab in on the continent since 2012, right? And there's no shortcut to it. We can't be sitting where I'm sitting in Yorktown Heights, New York and be hoping to really do the right work for Africa. So we need to be in the right places and work on the technologies and stealing a phrase from Skyler. So not just working on AI for Africa, but from Africa, right? And then when we get into the technical aspects, we shouldn't be taking shortcuts anywhere in the AI life cycle, right? So this is a picture, which is a typical AI development life cycle, which involves several stages. So there's a problem specification phase, a data understanding and data preparation phase, modeling phase of evaluation and then deployment and monitoring. And there's different personas involved in each of those stages. So there's problem owners, data engineers and data scientists, model risk validators and AI operations engineers. And normally this picture would not have these diverse stakeholders on the side, but I've drawn it on purpose, which I'll explain in a second. So as we're going through these different stages, what are the shortcuts that we might be enticed to take, but we shouldn't turn? So when we're in the problem specification phase, the shortcut that one could take is just have the problem owner, usually someone in power, decide what to do and that's it. But what we want to do instead is take advice from a panel of diverse voices to provide input, whether even a problem should be solved using, first of all, should be solved or not. Is it a instrument of oppression? Would it lead to human rights violations and so forth? And if so, that's not something that we want to do, right? And often people in these privileged sort of positions don't have all of the information. They have blind spots to what really matters to people in various sort of conditions and often in the less privileged positions, right? And so that's one part of problem specification where the input of diverse stakeholders is very important. Second is once you've decided that this is a problem that we should go forward with, what are the criteria that we would be judging a solution by? Is it only accuracy or should we have fairness and explainability and robustness and other things as part of the performance indicators? When we're in the data understanding and preparation phases, it's very easy to just take a data set that we have and just go forward with it. But what the shortcut is doing that, what we shouldn't do is just assume that all our data is great, right? So what we should actually be doing is thinking through where did the data come from and what biases does it have, right? So there's biases in terms of social bias. So these are things that are attributes or characteristics of society so that when we measure features that they already have biases encoded into them. So taking this A-level example, if there's certain cultural knowledge that's already encoded into questions, then someone coming from a different background will not be as easily able to answer the questions, right? So the features themselves, the score is already biased. Then there's sampling biases. So you might over under sample certain groups and populations. Also, there can be issues of sampling in time. So if you have a data set that came from the pre-COVID world and you apply it for a model to make decisions in the post-COVID world, that mismatch will also lead to certain issues and biases. And then there's even subjective choices that one makes as a data scientist when you're preparing the data that can exacerbate biases that might not have been so much. And one example from one of our colleagues someone in there saying that shows this is with healthcare data. So African-Americans in the U.S. tend to utilize the healthcare system less for the same level of how sick they are compared to white Americans or European Americans. And so this shows up quite a bit if you lump together all of the different healthcare utilization from inpatient, outpatient, emergency room and so forth. But if you keep them separate, the bias is not very much. So this is an example where the data scientist not even thinking about biases do something and that leads to more bias than would have been there otherwise. Also there can be adversaries that people who introduced data poisoning at this stage as well. So keep all of this in mind before you go on to doing anything else. The reason I've been using this no shortcuts phrase is there's actually a pretty nice paper called shortcut learning in machine learning. And it's not just the human aspects of shortcuts, right? So the machine learning models themselves can take shortcuts. They are just math, they're optimization problems and they will take shortcuts to get to the solutions that you instruct them to go for. So they'll use the background of an image to predict that there's grazing sheep in an image when they're actually aren't or they'll hallucinate objects just from weird sort of patterns or they'll recognize a particular disease not based on the anatomy of the patient but from some writing or some tokens that are in the image. Or you can add some spurious text at the end and they'll get confused. So all of these are things that the algorithms themselves actually can have happen. But there's also things that we as people the data scientists can do to reduce some of the biases or other sort of lack of trust issues that come up. So there's different points of intervention whether it's pre-processing. So changing the statistics of the training data whether it's adding extra constraints or additional regularization terms in the model training or it's post-processing the predictions that come out of a model. And there's ways to do this for all of these different topics of robustness, fairness, explainability, uncertainty, quantification and so forth. And so let me go into these very briefly. So an explanation is a justification for machine learning prediction. And depending on who is consuming those explanations and I'm sure Vera will say more about this later on that there's different ways of explaining. And it comes about in different sort of points in the pipeline as well. With unwanted bias, this is anything that places privileged groups at systematic advantage and uncriveled groups at systematic disadvantage. And we already talked about that these unwanted biases come from prejudice in historical data, from sampling issues, from the data preparation stages but also from mis-specifying the problem. So if you're trying to predict whether someone is going to commit a crime in the future but you look at arrests, that's a poor mis-specification that can also introduce biases because first of all being arrested doesn't necessarily mean that you're guilty. Second, people who are arrested more often tend to be in neighborhoods where the police is just more active and that's often skewed. So now to adversarial robustness. So an adversary is a malicious actor trying to meet their own goals to the detriment of the goals of the and the problem specification. So adversarial robustness is all about detecting, preventing and certifying against attacks in different ways by malicious adversaries and they can poison training data, they can introduce weird samples into the machine learning model as input and so forth. And when we're doing this adversarial robustness it's not just about these bad guys who are sitting in a corner trying to fool you. It's also a way for ourselves to push AI towards limits as a test. And then finally there's uncertainty quantification. Does the model know when it doesn't know? So to give an example of this and introducing skin disease diagnosis which you'll hear more about from Guillermo and Celia later. So this is a problem where this is an image of the skin and some lesion on the skin and we're trying to diagnose the disease. And for different types of samples the machine can be more or less confident. So in both of these examples to the right the machine is more confident. And because it's more confident we can have more trust in it. And because it's telling us that it's less confident towards the left. This can be an example where the human dermatologist gets consulted rather than having the machine make the decision on its own. So moving along in our life cycle there's also shortcuts in evaluation. Again, we want a panel of diverse voices to tell us and not just have one model validator do the whole job. So that the model validators blind spots get filled in by the diverse lived experiences of others. Good. And then finally in monitoring it's very easy just to focus on accuracy. Once you have a deployed system that will tell you if your system is integrating due to distribution shifts. But what you also want to monitor are things like fairness robustness explainability over time. And finally what we want is also transparency throughout the development. So we have a technology called AI fact sheets which is going to capture automatically many different facts about what is going on. So these can be intended uses, different processing steps, data engineering steps, different tests that have been conducted and so forth. And all of these once they're collected automatically and made available to different personas they provide a lot of transparency and can be used to govern the overall process. So just to summarize again, don't take shortcuts anywhere in the AI life cycle just stop and think and you'll probably do the right thing then that's the core and crux of trustworthy AI. So just to end, let me leave you with a Sudanese proverb that I learned at the Deep Learning in Davao in 2019 which is that we desire to bequeath two things to our children. The first one is roots and the other is wings. So our wings to you are open source toolkits AI fairness, explainability, robustness and uncertainty quantification and a new toolkit that will be coming out in the end of this month causal inference 360. So these are their websites. And the last thing is that the roots is what Garthi and Ezenay are going to be talking about next. So this is where, what are your values and don't lose your values as you go forward and developing AI for yourself and your communities. So thank you and I'll turn it back to Tyler. Great, thank you so much for those comments, Chris. That was a really great introduction to the topics we'll see later today and capturing the breadth of what trustworthy AI is trying to achieve. There are some questions coming in right now. We're going to please keep those questions coming but we're going to have a session at the end of three talks where we'll address them but I do want to have Chris lead off with a cheat question so he has time to think about it. What's the difference between trustworthy AI and responsible AI? So that was one of the questions that came up. Don't answer it now, we'll save that one panel later on. All right, now I'd like to transition to our second speaker, Dr. Karthi Ramamurthy. And Dr. Karthi is joining us also from the Yorktown lab and has been working in the trustworthy AI space for multiple years. A great piece of information that you need to know about Karthi is he's been the chief architect of the AI 360 toolkit. So you saw those little flags come up on Kush's presentation that said IBM AI 360. Karthi's the architect behind all of those and has been compiling all of those different algorithms and putting them in one space. So he's got a really strong background in this space and he's going to now take the next 20 minutes and share some of his perspectives on trustworthy AI. Go ahead, Karthi. Thanks a lot, Skyler and thanks Kush for the leading with the excellent presentation and thanks to the organizers and the audience. So I would like to take this opportunity to talk about perspectives on AI from different world views in particular focusing on values of humans and AI. So what is the relationship between AI, humanity and values? So the first relation is that all products of humanity reflect the values, experience and aspirations of people who develop it. Some simple examples are, for example, literature, art and pretty much anything that we develop, right? And AI is definitely a product of humanity and those who design but those who design and develop AI systems are a very small fraction, unfortunately, compared to those who consume it knowingly or unknowingly. So since AI is a product of humanity we would want AI to work for the people but unfortunately the amount of freedom that people get when they use AI is not a lot, right? So let's say you want to watch a movie and you go to Netflix and there is a streaming algorithm that tells you what are the movies that may be appealing to you. Now you may think that is a good form of personalization and you may not like it and you may want to switch to a different platform but the fact that you cannot still escape the fact that there is still an algorithm kind of dictating what could be your preferences, right? So there is no way getting around it. So AI is definitely a product of humanity but since the developers of AI systems are very small fraction with probably a very different set of values compared to who you consume, who consume it it may not really work for the entire humanity, right? So and the other issue is that although methodological advances may happen throughout the world the core ideas and principles are still heavily shaped in certain narrow scientific communities. And if you look at national policies of AI of different countries, so even there you can see that they have already accepted the fact that this is what is AI and this is how it should work and this is how it should look rather than doing a deep introspection based on that national context. So what I would want to focus on is the different perspectives on AI looking at from your viewpoint, from my viewpoint, from everyone else's viewpoint, what is AI, right? So let's first start with the question how many world views are really there, right? So are there only like non-Western and Western viewpoints which is what is popularly discussed in the literature? And when I started thinking about it I got really, really confused because it's not really the case. I am an Indian and my upbringing is Indian but even if you look within India and if you travel across the country to a hundred kilometers you will see that the values may change completely, right? So there are like millions of different viewpoints that anyone can take on AI, right? So what is this seminar going to focus on, right? I'll focus this on values and viewpoints that appeal to me personally but since we are all human beings and we have a shared humanity many of these values I hope will appeal to others also and if it does not it's still okay because the main goal of this seminar is not just to talk about the values that we think are important but to encourage the audience to think of AI critically from their own viewpoint and see how it relates to their personal experience and aspirations. What do you think should be the values of AI? So that is the main takeaway message from this entire presentation. So let's think about the value of value systems, right? Before jumping into individual values. So when humanity started evolving when human beings started evolving having a value system itself as a huge evolutionary step, right? So it's conceivable that in the animal kingdom there are some values but there are not a whole lot of values that humanity has developed, right? So it's a fine civilizational point. So having a value system itself is a big step for humanity. So and making it enhance the common good is an important meta value. So in our languages in India, this is called as Dharma in North Indian languages and Aram in my own language, Tamil. So loosely interpreted, this is defined as the right way to lead life or standard of conduct. And absence of Dharma or Aram is known as Adharma or Maram, right? Which basically translates to might is right, right? So that is what is Adharma or Maram. So let's think about the value of value systems using a simple example. So let's say there is a bank in a town that lends money to people. The only way in which it will decide if someone will get a loan is based on their income alone because that is the least risky path for the bank. So now it's hard to imagine, it's not hard to imagine that this bank will contribute to immense increase of inequality in wealth and put already marginalized people in precarious situation. Now this doesn't happen very often because there are regulations that govern how a bank should operate and many times they also lend money to poor people. But let's say there is a bank like this. Now the question is, can this bank do better in terms of values, right? And of course, obviously, so this bank should put the common good more than the good of its own, right? Now as a thought experiment now replace the bank with an AI, right? So in unregulated scenarios, AI could be making decisions like this basically deciding based on what is most beneficial for the people who benefit from that AI, right? So, and it may not consider the other stakeholders. So that is why we need a value system incorporated into AI so that it serves the common good and the humanity more than very few small actors. So the first view I'll talk about is called the long view, right? So I'm going to have small stories because an American poet once commented, the universe is made of stories and not atoms. So I'm going to mention little tiny anecdotes and try to think about the values from that perspective. Let's say there is a parent that allows and encourages a child to watch TV excessively because let's say the child likes watching TV very much and child is really gratified when they do so. As the child grows up, what will it think of its parent? Let's say the child is growing up in a good neighborhood and it's looking at other kids, they have achieved academic excellence, they are going to good jobs, they are doing well in their lives, but this child missed out on so many things because it was watching TV most of the time, right? It wasn't focusing on its, the child wasn't focusing on its studies. So as they grow up, they start resenting the parent, they may start resenting the parent. So this is a very common thing, right? I mean, I'm sure like our parents and many of the parents in this call when they talk to their children, they say that don't watch TV, focus on your studies, do something useful with your life. So this is taking the long view, which means prioritizing the long-term good or short-term gratification. Now the question is, do we think AI systems take the long view when it comes to interacting with humans and serving humanity, right? So it's a very tough thing to answer because there may be some AI systems that do, but let's say for example, there is an AI system that keeps recommending you movies even the middle of the night, even though you have to go to sleep. So is it really prioritizing the long view? Probably not, right? So this is something that that's a value that can be incorporated by many AI systems. So the second value I'll talk about is what is the other person thinking? Let's say there are two friends who meet up for a coffee after many years staying out of touch. One of the friends keeps talking insistently about their own experience in the time they were out of touch. And this friend doesn't show any concern for the other person's life. So what do you think the other, the second friend will think about this first friend, right? So do you think they will meet again? So it's very unlikely, right? Because when you meet someone, you want to express genuine interest in their welfare and you also want them to reciprocate by expressing some interest in your welfare. And this is a key human value that nurtures relationships without interest, without interest in others' welfare, there is no human relationship. So the question then is, do current AI systems take genuine interest in your welfare and care about what you think? And do current AI systems encourage you to take genuine interest in others' welfares and see what they think, right? So this is something to think about. So if the AI system is just pushing what it has to you and doesn't care about what you think, then maybe this value has to be incorporated in that system, right? And also when I say AI systems, I don't mean AI systems as an, AI as like an entity, but I really mean people who like develop AI and design these systems who should really take some of these things into account. The next value is importance of context. This has to be, this has got to be one of the favorite values, right? So let's say a wise woman has three disciples and two of them come to her with the same problem and she advises one of them to do something and the other one something completely different. And the third disciple is really surprised. He asked this lady, like, what are you doing? Like you are advising two different people, completely different things, even though they have the same problem. And now this wise woman explains to these disciples that they're two different, completely different contexts, but that this third person is not aware of, right? So, and this disciple is now satisfied. Yes, you're right. I mean, if they are completely different contexts, then it makes sense that they have to get completely, do completely different things. So this is something that we are really taught of and this is one of the pillars of rationality, right? So if you start taking things at their face value without understanding what the underlying context is, then we cannot be called rational human beings, right? So, but the question then is, are AI systems context aware and are they able to comprehend the complexities for seemingly similar situations when taking actions? So seemingly similar situations in their face value may have completely different contexts, right? It's like looking at one time snapshot and forgetting the entire past, right? So are AI systems capable of doing that? So sometimes yes, but many times no. And this is something that can be definitely incorporated into systems to make them more useful for humanity. So the next value I'll talk about is ambiguity, right? Let's say there is a fisherman who went to catch fish in a lake, but a recent storm had muddied the waters. This fisherman went back home. He did not catch fish that day because he knows that he won't get much fish and returned after a few days so that the mud can settle down and then he caught huge amounts of fish. So this mud is akin to the ambiguity, right? So if you have an ambiguous situation, we usually wait to gather more facts or until an ambiguous situation changes before taking decisions. And this is also something that parents teach their children all the time, right? So if there is a really difficult situation and there are not a lot of facts, they will ask the child to wait. Why don't you wait for a couple of days? Why don't you see what's going on and then decide, right? So this is such a common human thing, but the question is, can AI systems recognize ambiguous situations and take actions accordingly, right? So are they always like eager to take some action without thinking about whether the situation was ambiguous or not, right? So there have been some advances in AI, like uncertainty quantification and understanding uncertainties, but this is a really good step, but there is a long way to go before AI systems can actually fully recognize ambiguous situations. The next value I'll talk about is criticism. Let's say a ruler has ASAs for ministers and nobody to criticize their actions. And let's say there is a national emergency, right? And we do have a lot of national emergencies going on in many countries now. And during a national emergency, if you are not amenable to criticism or if you are not willing to take in criticism and modify your stance, then you will be awfully underprepared, prepared. And the problem is it's not just the ruler that suffers, it's the whole nation that suffers, right? So being able to accept criticism and modify behavior is considered highly beneficial. And so also something we talk about all the time, this is like a common sense thing, right? But now the question is, let's say you are interacting with an AI system and you think behavior is unacceptable. So is the AI system able to take your criticism and modify its behavior based on your feedback, right? So that's a key human value that can immensely benefit AI systems. So different sides of truth. Four people are blindfolded and they are asked to touch an object and say what it is. One says it's a long tube, one says it's a rope, one says it's a pillar and one says it's a big leaf. So it was actually an elephant and the people had touched its trunk, tail, leg, and ear and they had thought it was a rope, a pillar, a big leaf or a long tube, right? So now, so it means, so what this says is that if you have a limited sensory perception and if you are looking at a limited viewpoint, it's very hard to understand the truth by using the superficial information. So and a deep analysis required. So what would have benefited is if these four people had gotten together and they had discussed what's going on and they had thought about what could it be and they had taken into account other things, they may have concluded, there's a small chance that they would have concluded that it was an elephant, right? So now the question is, are AI systems capable of, do AI systems know that they are likely representing one small side of the truth and capable of deep analysis to widen their comprehension, right? So I think these are like actually two different questions. First is the capability to understand that they represent small side of the truth. And the second is the capability to perform deep analysis to widen their comprehension. So these are extremely good values for AI systems to take into account. So the next value I'll talk about is agency. So let's say there are 100 families in a village and each family has two children. The village school can teach only 100 children. Now, is it fair for one child from each family to go to school or is it more fair for both children from half the families to go to school? So this is a very, this is actually a trick question because I mean, there is really no one correct answer here. So because what is fair and what is not fair should not be decided by some arbitrator sitting outside the village and the community and making their decisions but it should be decided by the families and the communities, right? So that's why it's a trick question. So even though we have many different definitions of fairness that may address some of these issues and that's a very good step for AI. So AI systems should still be designed and to allow individuals and communities to act according to their best interests. So that is what is agency of that. That's what will provide agency to the individuals in the community. So and this has to be done at the design phase. So there is some legwork that's been happening in providing like Kush mentioned involving other stakeholders and designs but there is a long way to go for us to before we can realize the fruits of those efforts. The next one is need for explanation. So let's say a child has two parents and one parent always, both parents always show the child what the right thing to do is but one parent explains the reasons for the suggestion in a friendly language whereas the other parent does not. Now the question is who is the child more likely to follow? So I think the answer is obvious. So the child will probably follow the parent and probably even like the parent who explained the decisions in a friendly language because as human beings, we are curious creatures, right? So we always want to know what's going on and why something happened. So it's one of the things that really preoccupies us. Why did it happen? What could I have done to make the situation different? So these explanations and these reasoning for decisions helps human situation situate the event in a larger context and ultimately derive meaning for their existence. So then the question now we can think about is are AI systems capable of explaining the process or decisions in a reliable manner when they make decisions, right? So are AI systems essentially helping us derive more meaning from our life because if you get enough explanations about what's going on, you may be more satisfied with your life and you may have more control over your life, right? So that's something that AI systems can definitely incorporate and there is a lot of work actually explainability in AI is one of the hardest research topics. So we are making some headway but there is a long way to go before we can actually derive explanations that can be fully satisfactory to humans and they can actually truly describe the reasons behind an event. The next value I'll talk about is non-utilitarian outlook. So Indian railways has about 1.25 million employees and has a route length of about 68,000 kilometers. So a few years back, it even used to have its own budget in running into billions of dollars. So many years though, it has run into losses. When I was growing up, we always had the news though we suffered like last this year the railways suffered loss but it has been very restrained about cutting down rooms or increasing fares since it's a lifeline for many communities. So it was not, the railways was not just looking at profit but ultimately serving communities. Sometimes service to community can be more important than profit. Particularly when the benefit the communities derive can be much more than the profit that the, for example, railways gets, right? So the question then we have to ask us, do AI systems incorporate non-utilitarian objectives at least when serving the most vulnerable and the underprivileged, right? So, and this has to be done by design it cannot happen automatically. So you need to have at least some applications where we have AI systems that take a non-utilitarian outlook and they really think about only service to community. So having said all this, now let's think about what the way forward for AI is, right? So first thing I would like you and I would like us all to realize is that AI is a product of humanity. It's not a magic box. It's not something that it's an esoteric concept, right? So it is something that was designed and developed by humans. So it is a product of humanity. And why is this important? Because it then gives us the power to shape its development to achieve the larger goals of humanity. If it were not a product of humanity and if it were like an esoteric natural process that humans had no control over, then there is no way to shape it, right? So that's why it's important to understand and realize this. And in that process, we should also think about the fact that AI systems should not just be mathematical abstractions. So they should not be fun things that are done by developers sitting in some corner of the world. They should actually be intricately tied to the values of the humanity if the gains must be realized by the people. And most importantly, developers of AI systems whether in the present or in the future must be keenly aware that ultimately they impact people's lives. So when you write a piece of AI code, it's not just for fun, of course it's for fun, but it might also like impact someone else's lives, right? So you have to be like very, very careful and incorporate as much trustworthy AI principles into that as possible and as much values into that as possible when writing this piece, even if it's like a small function that will go into an AI system, right? And finally, the take home message I would say is the audience in this seminar series should embody the values they care about for themselves and their society in AI, right? So this is really a huge challenge for us. AI can be very, very beneficial to humanity, but unless it has the values that we appreciate and we all care about, how can it ever serve humanity properly, right? So that's something we should really be conscious of and we should really be careful about. Thank you so much and that's the end of my presentation. All right. Thank you so much, Karthi, for those thoughts. I've worked alongside Karthi for a couple of years now and I've actually just in the last 20 minutes had a broader perspective on trustworthy AI from his Indian background. So well done. We wanna continue this theme now of different perspectives of what it means for trustworthy AI. And we're gonna transition to our third speaker before returning to a panel session. And our third speaker is Azine Cuando and she was a IBM intern in the social good program in the last three months. And I believe within a few weeks she's starting her PhD at UC Berkeley. And so once again, we just have about 20 minutes now for her to share more unique perspectives around trustworthy AI. Take it away. So yeah, thank you, Skyler, for the introduction and thank you, Karthi and Kush for laying the foundations for my presentation. There are a lot of ethical AI frameworks that have been introduced in recent years and as Karthi discussed, products of humanity reflect the values of the people that develop them and AI is no different. And so in this presentation, I'm going to dive a little deeper into some of the values that Karthi has discussed already that I think are a particular interest to communities that I'm a part of and that I believe the audience is a part of as well. So just a brief outline of this talk, I'll do a quick introduction on some of the social contract theory because that's where a lot of ethical AI frameworks tend to start. And then I'll talk through what I saw or I found was missing and talk through some examples of work that's being done. And these are some things that I've worked on but also really just highlighting work of the amazing work of other researchers in the field. And more work that needs to be done as well. And I will say that I can't cover all the work that's being done. So just know that there is a lot of work being done and more avenues of research to explore. And I also wanted to say that I'm still learning, still learning about all of these topics. And I hope that encourages the people in the audience to also just learn about things that they find interesting and contribute where they can. In fact, in this presentation, I think I have more questions than I do answers. And so, yeah, I just hope that this gives you all, encourages everyone to contribute, yeah, because there's still a lot of work to be done. So yeah, so I'll start with some social contract theory. Cause like I said, that shapes a lot of the ethical frameworks in existence today. And it's supposed to, what essentially this does is give a guide on how individuals are supposed to relate and interact with each other in a society. And because AI is permeating a lot of aspects of society, this kind of, this building block is where a lot of frameworks go forward to determine how AI should then interact in our society and how we should interact with AI as well. So for starters, Thomas Hobbs was a philosopher who, defined social contract theory as this method of justifying political principles by appealing to agreements that would be made among what he called suitably situated, rational, free and equal persons. And essentially he believed that the social contract mainly lies in the abilities of individual human beings who are willingly deciding to form a society and that the best way to understand the society is to go to the individual beings that constitute them. And it was, he also said that without social order and certain constraints that like humans would revert to their indigenous individualistic nature and problems would arise. And other philosophers kind of dive deeper and enumerate things that must be protected under the social contract between individuals such as human life and human liberty and property. And as AI has developed, we see these values and ideologies embedded in a lot of AI systems. And for example, there is a popular undergrad textbook in AI, I think it's just called Artificial Intelligence by Stuart Russell and they define a rational agent as one that does the right thing. But as we all know, the right thing is contextual and culturally dependent. And so we'll kind of talk more on that later. And so as concerns around the benefits and harms of AI has increased, a lot of frameworks around the way AI should interact with society started to develop. So Karan and Gupta proposed a social contract for AI that states that systems, and I have the reference in case anybody is interested, but they state that systems, AI systems should be developed and implemented with four key elements in mind. So that is socially accepted purpose, responsible methods, socially aware risk and socially beneficial outcomes. And so similarly, Luridhi et al discusses the opportunities and risks associated with AI systems and they propose 20 specific recommendations that are categorized into five groups. And they are that AI should promote well-being, do no harm or non-milificence, respect human autonomy, support justice and demonstrate explicability. And a lot of there, these are just two examples of frameworks, ethical frameworks around AI, but there are so many more. There are a lot of companies who in the past few years introduced guidelines for how they believe that AI should be developed and us even different academic societies like the ACM, the Association for Computing Machinery also has developed their own guidelines. And so there's a lot going on in this space. And most of them kind of refer to some, are very similar to each other and kind of refer to the principles I just mentioned and ones that are mentioned that we've mentioned before like in Cush's presentation. And so similarly, like there's more specific concepts like fairness and explainability and transparency that are also important to consider that have been included in these ideas and frameworks. And so, while these principles are very much needed and guide kind of the creation of trustworthy AI, there are viewpoints and perspectives that are missing. And the fact that they're missing is I think an indication of deeper rooted power and balances that kind of exist globally and can be traced back to a lot of issues such as colonialism and so on and so forth. And so I'll give some examples of the missing perspectives that, you know, Karthi also touched on in his presentation and just highlight some other, you know, philosophies that touch on these concepts in their work as well as how it relates to work that I've seen in AI. And so as an example, again, so Kwame Gayaki is a Ghanaian philosopher who wrote largely about communitarianism and personhood in Africa. And he wrote that communitarianism sees the community not as a mere association of individual persons whose interests and ends are contingently congruent, but as a group of persons linked by interpersonal bond, biological or non-biological who consider themselves primary members of the group and who have common interests, goals and values. And, you know, what this looks like, this is pretty different, I think, takes a different perspective than the ideologies that we saw earlier with the kind of formation of social contract theory. And so I think this just says that fundamentally social contract theory is gonna be different. And specifically in Ghana where Kwame is from, in a lot of other countries across the globe on the continent, just everywhere, right? So that definitely is something that needs to be taken into account. But, you know, how do you do that? How do you build community-based AI systems, systems that allow for communities to participate in its design, development and evaluation? And, you know, like I said, I don't have all the answers, but I think one initiative that I've seen that I think that I feel like embodies kind of this ideal is just highlighting, if you don't already know, this research initiative called Massacane that strives to ensure that African languages are represented in the field of natural language processing. And I think their system works to be inclusive of all African languages and takes, you know, feedback from community members to improve their translations across different languages. And so I think that's one example. And I think we can even dive deeper, honestly, into like building or taking input from different stakeholders and community, taking a more participatory approach from communities to develop AI systems. And yeah, I think there's a lot of good work to be done and being done in this ideal. And so there's also context, right? So, you know, context matters. I think a lot of people agree with that sentiment and it can be the thing that makes or breaks a decision for humans and machines. If one has data, but doesn't understand the context behind the data, you know, we build systems that do more harm than good. I think there's a lot of examples that kind of show that. And so in addition to decision-making, using data to be, using data and needs to be context-driven and culturally aware because the interventions that may be right in one context, you know, may not be correct in another. And so in a recent paper, which I'll link here, the co-authors presented many challenges that researchers and data practitioners face with data sharing in Africa. And so they make a lot of interesting points. So I would highly, highly recommend, you know, taking a deeper look, but one point in particular that they make is that, you know, understanding context and the local norms of communities is essential to understanding data. And yet it's often very much overlooked and a major, major barrier to data sharing. And so often, you know, those individuals and communities who make up the data set are not properly acknowledged. They're not considered as part of like, you know, this set of stakeholders when they need to be. And they don't directly benefit from the technology that's being built. And so to me, you know, that's a problem and the question here is like, okay, how can we make sure that context is taken into account? Some, like, you know, other questions that I thought about or that I've seen work being done around is like, what does privacy mean for different communities across different countries? What does, how do people interact with social media? And what does that mean for those people? What do, you know, farmers in different countries most concerning about AI and data and how can we build technology that takes into account these concerns and these understandings into account? Right, so, you know, it's dangerous to assume that all communities face the exact same challenges. And so, you know, these are a few examples of like how I think we need to study what context and norms exists first and get a better sense of that before, you know, trying to tackle some of these bigger challenges. We kind of just need to get a sense of the landscape and understand like what are the different trends that we see across communities? Yes, okay. And so then the last concept, I guess, that I'll talk about is trust building. And so, you know, this is, there's not necessarily a technical solution involved in this process, right? So, you know, the idea being that you want to build trust with relevant stakeholders, especially those impacted by the work and the subject of data collection. And, you know, if we want to develop AI in a certain way for the benefit of all those involved, it must be applied to improve the wellbeing of all stakeholders rather than, you know, just add to a specific, rather than add to a specific groups comparative disadvantage. And so the trust of the users will be crucial to, for such applications. And in practice that does vary widely and there's a lot of factors that are going to be relevant to building trust with somebody. So, such as, you know, age, somebody's educational background, their state of health and economics, their health and economic situation, and other kind of cultural factors. And so as an example of this kind of work in building trust was work that I was able to work on firsthand with my collaborator, Dr. Adina Machuve. And she is an AI researcher and fun fact a local farmer in Tanzania. And she, you know, she had a deep understanding of the technology and the obstacles, as well as the obstacles that farmers in Tanzania face. And she was able to build trust within the community because, you know, it was her community. And she knew firsthand the issues that they faced and worked with them to find appropriate solutions. And I think the main things that I took away from the work that she did to build that trust was just, you know, having sustained communication with the farmers and just taking the time to explain the technology to them and not hyping it up, not saying, you know, what's going to solve all of their problems but just, you know, taking one piece of the puzzle and saying, this is what we are hoping to do for you all and for the benefit of her community and having them be a part of the whole process from the beginning, before when this was just an idea to like now having an application rate. And so yeah, the question is, okay, how can we ensure that that trust is built and maintained with all the stakeholders? And in addition to the, you know, the lessons that I learned with Dina, I think that another way is just increasing, you know, inclusion and support of researchers from communities who are heavily impacted, right? So the research goals of this work, it doesn't try to predict the unpredictable or pass any moral judgments, but it tries to just give access to information and increased representation in the field of AI. And it is, you know, with these goals in mind, but I think AI can really have the greatest positive impact globally and is one factor in building trust out of so many more. And so I think that is still also something that needs to be built out. So what's next, I guess, is the question of this little seminar series. And so yeah, there are a lot of lines of work to be done far beyond the ones that I've mentioned here today, ranging from agriculture to tech policy. There are a lot of countries who are starting to build out or who have already built out, you know, a lot of policy and infrastructure around AI, especially on the continent, which, you know, was not the case a couple of years ago where a lot of the AI policies and strategies were coming from the Global North, but now we start to see a lot coming from the continent. And so ultimately, I think where we go next should be decided by Africans and poor Africans and even within that, there's already like a diverse set of perspectives, right? You're not gonna see the same solutions necessarily for Ethiopia versus for Kenya. And even within that, again, as Karthi mentioned, within countries, there's a lot of diverse set of challenges. But, you know, ultimately I think that there needs to be just a shift in power dynamics and so far as who controls the data, who benefits from the technology, where are stakeholders involved? I think they should be involved right from the beginning. And if we're to avoid, you know, irreparable harm, a shift where people are no longer just consumers or data subjects, but creators, designers and builders, you know, I think that is an important next step. And something that I think is already underway that people are working really hard to do. And so I would just encourage everyone who wants to get involved, who's not already to get involved by, yeah, just supporting. There's a lot of existing initiatives going on, such as, you know, Data Science Africa, Deep Learning and Dabba, Blockchain AI, to name a few. There's even a lot more than that. So, yeah, if you are looking to get plugged in, I hope you do feel free to reach out. And thank you so much. Thank you, Azine. We are now going to bring back all three of the speakers that have presented in the last hour or so, and hit them with a few questions. I know I already fed one to Kush. Karthi, I'm gonna feed one to you so you can think about it for a little bit, and then I'm gonna put Azine on the spot first, okay? So, Karthi, you get a bit of time, Azine, sorry. Karthi, one of the questions that came up is, can you give a few examples where the AI Fairness 360 toolkit has been used to address some of the different problems that you mentioned in your talk? So, a few, you know, kind of real-world use cases that you can think of where the AI Fairness 360 toolkit has been able to come in and step in on context or explainability or robustness. So, think about that for a little bit. But on the spot, Azine, I know you are heavily involved in the Black in AI society. Can you just spend 30 seconds or a minute or two talking about that program? Because I don't think too many people in our audience necessarily are familiar with it. So, here's your chance to stump for that. Yeah, of course. Yeah, so I am a part of Black in AI and we're an organization whose mission is to just increase representation in the field of AI. I mean, we do that in various ways. I think right now our main programs are through our annual workshop, which we workshop and other kind of social events at other conferences. So, the main one is at NeurIPS. And so, look out for that if it's happening this year virtually, where we also provide a lot of financial support so that people are able to attend and can submit their work and be showcased. Actually, I met Cush through the Black in AI workshop as well, so, you know, a lot of good connections to be made. And we also have an academic program that helps to bring, just helps people, you know, with the application process, graduate school application process, either for masters or for PhDs. We help seminars where people can come, ask questions and just demystify the entire application process. And we've worked really hard this past year to remove the GRE requirement from a lot of these programs. And that was successful and now you see a lot of programs are in fact removing that requirement. So, yeah, we have a lot of work coming up. And so, if you're not already logged in, feel free to just go to blackinai.org. And you can be a part of the forum where we also post job postings and other opportunities. Just a lot of opportunities are posted there to get involved. And so, yeah, there's a lot more on the horizon, like related to entrepreneurship and other kind of issues around getting representation there. So, yeah, stay tuned. Great, thank you for that. Okay, now back to Kush. The question we had earlier, can you compare and contrast responsible AI and trustworthy AI? And maybe even your working definition of those. Yeah, well, that's a great question. And it's one that we hear more and more these days. Yeah, I mean, partly it's just a choice of words, but I mean, one of the answers that we give, which I'll give right now is that responsible AI is kind of a blanket over three different topics. One of them being AI ethics, trustworthy AI and then AI governance. And so, let me just describe each of these three in turn. And then that'll hopefully kind of give you the picture of the overall responsible AI. So, AI ethics is all about kind of thinking about what are the right principles and the values of where we should be going, what AI should be and should not be used for and so forth. Then trustworthy AI is taking those principles and policies and so forth and actually operationalizing them. So, in terms of the actual methods and algorithms and technologies. So, the topics of fairness, explainability, repositions, transparency and so forth. So, coming up with the metrics, the algorithms and so forth. And then AI governance is kind of the end stage where once you have these methods, how do you actually institute them in organizations in any sort of deployment and so forth. So, if we think of these three as kind of sub-pockets, then we have a responsible AI as the overarching theme. All right, great. Yeah, it's really interesting how those interact and perhaps even envelop each other on those. So, thanks for that distinction. Karthi, we're coming back to you. Examples where we've seen actually some successes in bringing in more of these algorithms that do better robustness, better fairness, better explainability. Give a few examples. Yeah, I mean, internally we have done a lot of efforts. Like, we are trying to include many of the trustworthy AI principles in the internal processes in IBM. And we are trying to use these toolkits internally. So, I would say that's a big thing because IBM is a huge company. So, that's definitely one big thing. And also in some external facing aspects. So, for example, we have an engagement that was also advertised with the advertising team in IBM to bring about changes in how we can advertise stuff with fairness constraints in mind, right? So, that is one place where we are bringing in fairness toolkits and we are using AI fairness toolkits. So, that is one external facing example. And one of the goals is to actually change the way advertising works by including some of the fairness principles in the design stage itself. Also, I got randomly pinged in WhatsApp by one of my friends working in Nike. He said, you guys have done a great job with AI fairness 360 toolkit. We are using some of the tools to institute some fairness metrics. So, we do hear conversations about these from many different places. It's sometimes hard to track these things because they are open source so anybody can use it. So, that's one of the powers of the toolkit. So, there are these examples and we have also done, for example, analysis using real world data sets that are pretty widely known. Like, for example, the medical expenditure data which is like a gold standard data set in the United States. We have seen how fairness issues can arise in communities like African America. I think that's an example that Push mentioned in this presentation, how black Americans can actually utilize the healthcare system less even though they may be more sick than white Americans. So, these are some things and we have measured these things using the AI fairness 360 toolkit. So, these are some examples for fairness and we do have examples like this for explainability as well. So, yeah, I mean, there are, so the whole field is like a few years old so, but we have already seen like a huge movement in how people can use these toolkits and we have seen quite a bit of impactful impact using these. All right, cool, thank you. Another question and this one I will open up to three. I know don't all answer at once. I know that might be pressing. A lot of what we've talked about so far has been more on the idea of kind of specific AI, a model designed to do break. What can you guys comment about trustworthy AI when it comes to artificial general intelligence? Do the same rules apply or are they tackling a different view? And so, if you could, yeah, maybe expand on that a little bit. Yeah, I can start maybe give as in a sometimes since she was put on the spot before. So, yeah, so actually there was a really nice article by Karen Howe in the MIT Technology Review a couple of years ago, which kind of emphasized what's important now when we're talking about trustworthyness or harms of AI versus what the long-term future is. So artificial general intelligence for those who don't know, it's the idea that in some future state, AI will be so powerful that it has equal or even better capabilities to think than humans do. And there's a lot of dystopian sort of versions of how that might play out. So that includes kind of AI taking over the world and subjugating humans and all sorts of bad science and addiction sort of things that can happen. So those are all, I mean, yes, they are in the consciousness of people and researchers, but really it's the small things right now that I think are the most important to focus on. So if there's an AI system that's helping with the mobile money lending approvals and there's just a 2% difference in the approval rate among, let's say, women and men, that is the sort of thing that is the most important right now to address. And that's where our trustworthy AI work is focused. Yes, I mean, there can be kind of the AGI sort of things coming up later, but if we don't focus on what is affecting society in life right now, then I think we're gonna be harming ourselves even more than what might happen later on. Yeah. Yeah, no, I think I agree with that sentiment as well, just because I'm not in the field of AGI specifically, but of the few papers that I've read, it does seem that we're still, it's still forming, we're still deciding how we feel about that as a field. And so I guess for me and the type of work that I like to do and that I find is really impactful is the work that is impacting people right now, is harming people right now. And honestly, I kind of feel like it is somewhat of a luxury to think that far ahead. And so, yeah, I wonder if we focus our efforts on the things that are impacting underserved communities now and communities that are harmed now, like how much further we'd be along in that process to then start thinking about AGI, but yeah, that's not to say that, who knows what will happen when it's created, when it's developed, but at the same time, yeah, I think that focusing on the issues we have at hand is more than enough to fill a lifetime. All right, thanks for that. Karthi, yes, no, you can pass. You've got more questions. Yeah, I'm not so worried about AGI at the moment, to be honest. I mean, it's somewhere out there, probably, who knows, but I mean, right now we have so many things to deal with and the here and the now is more important. And like also if you think about, so then the question is like, will you have like one AGI systems, or will you have like a population of AGI systems, like human beings, one assisting each person? Like, we don't know, right? So, but I believe that if and when that happens, it's still the same values that we think about, like what Kush as an A and I spoke about. So they still apply, and it probably will become even more important at that point. Yeah, I'm done. Let me have one more thing on this topic. So one of our colleagues at IBM Research, John Richards recently wrote a paper on trusting individual trust versus institutional trust. And so everything that we've covered right now has been kind of focused on single machine learning systems for a single task. And so fairness, explainability, robustness and so forth are important for building trust in those. But then in the popular consciousness, kind of AI is just something out there. And that's something out there needs a different type of trust. And that's what we would call the institutional trust, right? So, like if you trust the banking system as a person, I mean, you don't go in and look at how the checks get read and how they get routed and all of these things. What you just need to know is, can I, is the institution trust whether you're not Indian Railways that Karthi mentioned was another example or the mail system, the postal system. So all of these things, you don't go in and trust the components or look individual processes, but the whole thing. And for that, what we have recognized is that it's more about kind of certification and kind of fact sheets and transparency that are the key rather than kind of these individual components. So there has to be a balance between both as well. Great. I think that's a very great point to end on. And that actually is now going to unfortunately conclude our question and answer session for the panel. So I think we'll have our producer behind the scenes transition away Kush, Karthi and Azine. Again, thank you all speakers for your time this afternoon. And we are now going to have a halfway kind of intermission session here with an academic, Dr. Benjamin Rossman from the University of Witt's Bottles Land or Witt's University. And I think two important things you need to know about Dr. Rossman is he runs the Rail Lab, Robotics, Autonomous Intelligence and Learning out of Witt's University. And second, he's been one of the co-founder and directors of the Deep Learning in Daba. And so that is a recent conference that's had been before COVID at least had different sessions across the continent really promoting artificial intelligence and machine learning capacity from the continent. And so Dr. Rossman is now gonna share his thoughts on this space now. Benji, go ahead, take it away. Hi, thanks so much. And it's great to be here. Yeah, it really is an honor. It's a very important topic that I think is being discussed today. And I'm gonna talk about something maybe a little bit more technical and a bit specific which is in the area that I work in which is reinforcement learning. And it's not a very long talks. I'll try to touch on some of the ideas that have come out of our research. And hopefully this is interesting. So looking at this idea of moving towards trustworthy reinforcement learning which is a bit of a different topic to what people often think about in the space. The kinds of problems that I'm really interested in is something like this. You've got an agent or a decision maker which might be a robot in some environment which could be a factory. And we're trying to learn complicated behaviors autonomously by the agent itself. So we like it to be able to move around and pick things up, maybe do something quite complicated, assemble different things or clean up an area or something like that. Now, this is quite a complicated thing to learn. Historically people would hard code this but we're in the era of AI and machine learning. So what can we do to learn here? Well, this is the reinforcement learning problem. And the idea is that we have our decision maker in the factory, our robot here that gets to take actions. And what happens is the state of the environment, the configuration of the world and the robot itself changes as a result of the actions that we take it. In addition, we get some sort of reward from the environment that tells us when we've done something good or bad. Now, the learning process is really trying to optimize the actions you take to maximize the amount of rewards that you receive. This is quite a complicated process in general for a number of reasons. Firstly, we have that our action space could be really large and potentially continuous. So you might be able to move your arms to different angles and you might have this question of temporally extended actions, which means I could choose an action which might be moving on but I could also choose one that says walk to a door which would require quite a long time to execute. The next problem we have is the state space which is in any real interesting problem, continuous and high dimensional, which makes it difficult to work with as well. So we have to have clever representations to make this work. And then finally, there's this question of reward which we often say is sparse and delayed. Now, this is challenging because if you define the wrong reward function that your robot or your learning agent might end up learning to do the wrong thing, right? You don't just want to learn to get from A to B but you want to do it maybe obeying the rules of the road and doing so safely. So this makes the problem quite difficult. But all these factors coming together, again, something we have to bear in mind while we're learning. Now, the question is as we're learning in these spaces it's also quite difficult to diagnose what's happening. So often what we do as a representation while we're learning is what's called a Q table. We try and learn something about the value or the goodness of every action under every state. But then you end up with some sort of crazy table like this which is hard to diagnose. We can't really see if it's doing the right or wrong thing and indeed if it's doing the wrong thing that can be hard to tell why. With particularly complicated problems people have turned to neural networks since where we get deep reinforcement learning and the challenge here is even worse. You've got this massive neural network with millions of weights in it and if something's not working exactly the way you wanted to it's very difficult to diagnose where the problem is. And then as I alluded to with the reward functions this is something really important. We were to annotate everything good or bad that could happen as our agent moves around the space. This is incredibly complicated and if we make a mistake here or do something wrong we could get really undesirable behaviors. So let's look a little bit about some of the research that's happening in my lab to tackle some of these problems. We're gonna start off looking at questions around the action in state space and imagine you had a problem like this you wanted your little agent to learn to solve a problem such as this. This is work done by my PhD students, Steve James. And if you're trying to solve a problem like this it's quite complicated. There's a lot you can do and it's continuous. What you'd like to have is some sort of representation like this that tells you, you know if I execute a go down ladder action I move from a precondition that says there was a ladder below me to something like an effect that says there's a ladder above me and the ground is below me. This is a more understandable representation easier to see what's happening when we work with our agents. Now we can learn these kinds of things that we've shown but fundamentally what you'd like to see is that if I'm going down a ladder what I want is a precondition that looks like two symbols. One I'm kind of in this X position over here and I'm kind of in this Y position over here which means I'm at the top of this ladder and the effect is that I'm in this position here no longer in this one and there. So I've walked down the ladder. So we'd like to be able to learn this and that gives us this piece of knowledge that we can use transparently and understandably. The way we do this relies on a whole lot of other tools from machine learning. We have to do clustering. We use support vector machines to estimate what's required for these skills and we use kernel density estimators to estimate the effects but we get these rules out that can be transferred between tasks. So for example, we might get a rule that says if I'm in this situation, so my agent's at the top of a ladder and I execute this descend ladder rule or action I end up in this situation is ground below me and a ladder above me or if I'm at a switch and I execute an interact action then the switch changes position. So I can actually learn these which are much more interpretable and understandable than the previous representations I was talking about. We can also ground these, learn specific instances where this can happen and by piecing this information together we can actually learn to solve these complicated problems, right? So this is just now with planning we can use these representations to solve this complicated problem. We can take it a step further that says if we've got different actions that are relatable we can actually compress them into a more general form and we've been able to use this to solve very complicated problems these very long horizon tasks in settings such as Minecraft. In fact, once you've got these objects worked done by another PhD student of mine of Firmarum you can use these ideas to get these transferable rules that we can use in multiple places. So the idea here is maybe we could get a rule that says if I'm in this world and I try to go east and there's a box to the one side of me and there's a wall on the other side of the box nothing's gonna happen we can learn specific dynamics that happen for these objects. We've done a lot of theory to show that this can be efficiently learned but the effect is I could learn everything about this game from the small instance with 8,000 states and with no extra learning I could understand everything I need to play in this one million state level. All I have to do is plan at this point. All right, that was talking a little bit about the states and actions but can we do something with rewards as well? Well, turns out we can. The question is if I'm in this world and my agent knows something like how to collect blue objects or how to collect boxes can I do more with that? So I might be able, I might want to ask can I collect objects that are blue and boxes or something like I want objects that are blue or boxes but not a blue box? Turns out you can't do this with traditional formalisms that gives you suboptimal responses and we've defined new structures that allow you to do this. So these are what we call extended value functions that allow you to solve the individual problems optimally. So if I've got this object that lets me learn how to solve collect blue objects which looks like this or collect boxes which looks like this I can combine them optimally to solve other problems such as collect things that are blue or boxes and this is what the optimal policy will look like no extra learning required things that are blue and boxes things that are not boxes or even these more complicated functions like I specified in the beginning so all of this then happens with no additional learning. The cool thing about this is that if we learn a set of base tasks we can actually solve a super exponential set of problems just by combining the knowledge we've already gained. What this really means for us that's important is if we want to specify a really complicated task in the setting we don't have to rely on a human doing all the right things and thinking about all the right numbers to put in certain places and if they mess up there could be something catastrophic happening as in maybe a self-driving car crashing or something like that. Instead we can just define task specifications in terms of very simple things that we know work well. So what I've shown you today is that we've got a lot of work moving towards keeping reinforcement learning understandable through ideas such as symbolic representations and composable task descriptions. There's a lot of exciting work in the space and hopefully this will lead us towards a future where we can use these kinds of techniques for learning behaviors in a way that is safe and interpretable and understandable. Thank you very much. Great, thank you Dr. Asmin. I know you're not available to stick around later for the Q&A so I'm gonna throw one to you really quickly right now. I'm a video game player so I love the idea of reinforcement learning being experimented with in a game setting. But what would you comment to a criticism that says you're just playing video games and that's not going to respond to real life scenarios. Can you, if that was a criticism coming to you from the speaker, not me, how would you respond to that? So that's a very valid concern and the reason that a lot of those kinds of things, so video games and robot simulations are usually where reinforcement learning algorithms are used and largely the reason for that is you can't just learn from a data set. You need some idea of what happens when you try different things. You might be looking at counterfactuals. If I was in the situation and went left versus right, what happened? So yes, that's certainly the case. Now there's a lot of work at the moment trying to move this into the real world. So specifically, there's a lot of work around if I start learning something in a simulated world, can I then transfer it to a real world? Which says, maybe I've got a self-driving car, I don't want to learn from scratch there because that sounds like it could be slightly troublesome. But if I can do a lot of my learning and simulation and then just fine tune it and tweak it for the real roads, then that's an important thing to do. The other thing people look at is trying to incorporate expert advice or data that you've got offline to help you learn either from offline or in a safe way. So maybe you do the trial and error in very safe ways. We've been doing some work where you can ask experts if you're not sure, or ask a human if you're not sure what's happening in certain situations. And so there is a lot of work in this direction. But yes, the bulk of the history of reinforcement learning is very much in games and robots. Great, okay. Again, thanks for that. I think this was a really nice take down through a technical talk in just a few minutes. So again, thanks for your time. We're now going to transition into the second half of the seminar series today where we have three more speakers that are going to take a look at explainable AI out of distribution detection in dermatology sets and then also representation of skin tones across medical images. So those are kind of the three highlights coming up for the remaining three talks. And then once again, following this morning's pattern we're going to have a Q&A session for the next three speakers combined together in the evening. So I want to now introduce our next speaker, Dr. Vare Liao, and she's now going to take us through what it means to have a human-centered approach to this really cool area. Take it away. Thank you. Thanks, Skylar. I hope everyone can see my screen. Yeah, thank you for having me here as a human computer interaction researcher and I'm super excited that we have a whole session dedicated to human-centered AI and my talk is going to focus on explainable AI. So this previous talk, including the keynote, we have mentioned this many different 360 open-source toolkits released by IBM Research. Explanability, AX360 is just one of them. As HCI researcher, I often see the release of these toolkits the beginning of my work. We're interested in how do we take these toolkits, this different technique, to build a real-world application? How do we design the user experience? And also as a whole, how do we navigate this design space of this emerging trustworthy AI technology? So how we can take this toolbox of AI algorithm and make them toolbox of design material? Hopefully by the end of the talk, you have some idea how we do this kind of bridging work. So when we think about application and user, we often have to take a broader view of explainability. So we consider anything that help people understand AI better as explainability feature, not just about the model's decision or particular part of AI. For example, we often think about explaining supervised machine learning, right? So what is supervised machine learning? Is you're trying to build a machine learning model with a set of training data that have different instance, they will have labels, they share certain features. What it can do is when new instance coming in, the model can make a prediction, say this is a cake. So the majority of the XAI technique available in X360 focus on here is to explain how the model arrive at this decision that this is a cake. But user might also be interested in the explanation of training data, which we have technique in X360 addressing that. They may also be interesting in different kinds of model facts like performance, limitation. There's another toolkit called Faction 360 that can give you some solution for that. So, explainable AI interpretable machine learning has really become kind of a buzzword recently. One reason is of course, AI is increasingly used in high stake domain. So if the AI make an error, the consequence can be very problematic. And also users in those situation, they will be more cautious, right? If you hand them a black box model or personally, I like you to use the term opaque box model, they might be hesitant to use it. So there is this trust and also adoption issue. And also currently there is a very active technical field producing many different new technique is because at a current time, at least in the average setting of training machine learning model, there is a performance explainability trade off, right? On the one end, you have simpler model like linear model, decision tree. They're relatively easy to understand, but they don't perform that well. On the other end, we have very popular deep neural network or ensemble model. They perform very well, but they are complex opaque. They don't necessarily follow human understandable logic. So when you're using that, you often have to use a new set of a technique or algorithm to generate explanation. That's what we call post hoc explanation. And one example, a very popular algorithms called LIME, right? LIME does is to take this deep neural network you already built and you want to understand this particular prediction. What LIME does is just look at that input and output and was a labeling instance and build a simpler model in the local region, see a linear model and use that simpler model to explain the complex models decision in the local region. The form of explanation is what we call feature contribution. So it highlights that this instance is predicted this because it has certain features. It's very popular because it can be applied to any kind of data, any kind of model, right? If it's image, it will highlight patches of the image or super pixel that contributed to this decision. If it's a text, it can highlight which are the keywords that contributed to the model's prediction. So LIME is just one example, right? There's a whole technical landscape. There are different kinds of explanation. I'm not gonna delve too much in that. If you're interested in that, a place to start is to check out AX360. We have a whole session about the kind of data technique landscape. There's also a course material we taught in a recent conferences. I will give you example that how this different explanation look like. So as I mentioned, my interest is what do we do with this kind of technique, this kind of toolbox, right? How do we take them to design real-world explanation, the real-world application? So I often take this view that our HCI research is in this in-between space of having this toolbox of XAI technique and building real-world AI application. And when you're in this space, there are two high-level questions you need to answer, right? One is how to select, how to select the most appropriate tool given an application, user group, even interaction. And how to translate, right? How do we do the design work to translate AI reasoning into user's term? And by doing work in this in-between design space, we also hope to bring insights in the real-world application to see where are the gaps and opportunity back into the technical community to inspire new algorithm, new technical work. So my HCI colleague and I, we have done different kinds of work in this design space. We have a few different kinds of education. But today I want to focus on one study just to give you an example how we approach this problem and also give you a point of view to think about how you might use AI 360, how you might want to design this XAI user experience. So to understand this design space, one way, one project we did was to talk to people who also work in this design space or likely work in this design space. So we want to talk to designers with working across IBM, different AI products. We want to understand what is their view of this design space of explainable AI and what are the design challenges? So we did this project starting in summer 2019 right after the release of AI 360. So very quickly, we run into this challenge. How do we talk about a technical space? People are not quite in there yet. Designers might not be familiar with this algorithm. There's also no shared language to even talk about XAI. So we decided to create this study probe, which is the HCI term that means that a concrete representation should run our discussion. We create a list of algorithm-informed XAI question and it's based on the following assumption. The first assumption, if nothing else, I want you to bring home this idea that user needs for explainable AI can be represented by what kind of question they ask, right? A why question or what if question, a how question will need different kinds of explanation. We also assume that a question can be addressed by one or multiple explainable AI method for example, local feature contribution can answer this question of why? Why is this has to be classified as a wolf? And a method can be implemented by multiple kinds of algorithms. So LIME is one example, there are other algorithms that might differ in certain computational property. But we want to stay at the granularity of what are the available explanation methods in the literature in the technical world. So we did a big literature survey, we arrive at this list of explanation method and we map them to what are the user question they can answer, right? A local explanation answers why question, counterfactual explanation answer, why not, or what if question. And since we're taking this broader view of the explainability, we will also added three other categories about model facts. This is our question related to the data, the output and the performance. So we have this nine different categories of a question and we brought them to talk to our designers. And we talked to 20 different designers working across 16 different AI products. We asked them to first tell us what is this AI system, AI application you work on? And what are some common questions users have for understanding this AI system? And then we walk through each question card, discuss, do they apply, do we miss anything? Why would a user ask those questions? So the middle two step what we're essentially doing was to quote unquote designer source a common list of questions that people have for understanding AI. And one contribution we try to make is to build this, we call XAI question bank. There are a list of common questions user have for understanding AI system, and they're organized by these nine categories of common questions. I will come back and discuss how we might use this question bank. I will very quickly also talk about what are the design challenge we discovered, right? The first one is that there is a huge variability of XAI needs, meaning that user will ask different kinds of question for different application, different usage point. The reason is that ultimately people are looking for explanation for different objectives. And in the paper we summarize what are common objectives just to give you two example. One very common reason people want AI explanation is that they want to gain further insights to make more informed decision or make better, take better action, right? For example, IBM has this, what's a supply chain system? There's AI behind to give you a prediction that this delivery might be late. But user really wants to ask this, why question? And also how to be that? How can I reduce the predicted delay? And the answer will determine their followup action, right? Is if there were the reason they cannot do anything about it, or is it stuck somewhere they can make a quick phone call? And another very common objective is people want to evaluate the AI's capability, especially common at onboarding stage when people just start using a system. They want to ask, why is the performance question? How well does it perform? Why it might fail? And some will also ask this how question? How do AI make decision in general? Another challenge people struggle with is at a current time, there's still a significant gap between the output of this algorithm explanation and human explanation. For example, human explanation, we're selective. We're not giving you a whole puzzle chain. We give you the most important reason we also try to tailor to the recipients. And these are currently not still a big technical challenge to address. What also opened my eye from this work is the kind of design challenge, the process oriented challenge that designers or product team face in general. The first one is they're still very challenging to navigate a technical space given that this or emerging tech knowledge, everything is advancing very fast. And the core problem here is also you have to find the right pairing, find a right pairing between what's right for the user and what's doable given the technical feasibility. And another challenge is also the implementation costs, in the product team, the designer have to convince data scientists that this is the right solution. And sometimes this discussion came in too late and this is what we call technical depth that a model is already built and product team might not be willing to invest in what prioritize explainability. So what came out of that work after this paper was published in the past year, one work we did was to develop this design process. We call question-driven XAI design. We practice in several product team. We're trying to also incorporate it as a standard design thinking framework for IBM design that will hand to product team and our clients. So I'll go through very quickly. So this is, first of all, it's a user-centered design process. We encourage to start with the user research, understanding user understanding what our question they ask. We also encourage this is a process that designers and AI engineers should work together to find right solution. And we suggest that you can use the question bank as a checklist to identify for my application, for my user group, what our question they're going to ask. And another set work we did is to do this mapping, right? We take the question category in the question bank and then we map them to what are the XAI technique, the algorithm that can answer this question, particularly focusing on once that are available in open source toolkits like AX360 and others. I can share the slides so everything here has a URL link. You can use that to also see what are the different open source toolkits. And by doing the mapping, we derive at this kind of design guideline, what are explanation that can answer these questions and they are rounded in the technical feasibility. So the core idea here, one is we want to reframe the technical space instead of thinking about, is it feature importance? This is an example. We think about what kind of user question they can answer. And that will also encourage practitioner to foreground the user needs instead of technical feasibility. And also we want to turn this question as boundary objects where designer understand what are the reasons, what are objective behind those questions and data scientists can look into the technical detail to implement them so they can find the right solution. So we have this kind of four step design process. I have a working paper here if you are interested in the detail but the high level idea as I mentioned is start with user research. Start with understanding what kind of question they ask is a why question is a data question. Also analyze the question, understand which are the priority, what are objective you should achieve and then try to map the question to the modeling solution. You can use that mapping table. And then after that start with the technical solution, start with the design and also iteratively evaluate and iteratively redesign to close the gap. Very quickly this example we show in that paper that we practice this design process to build a explainable AI system for healthcare out of us event prediction which tell doctor that this is a patient that might have high or low risk of out of us event. And we start with user research, understand with that kind of AI system what kind of question they have. And you can see that the ultimate design solution has correspondence with different question including like what are the facts of data? What are the performance that users are interested in understanding? So that's the work we do. So we're as I mentioned we're working this into IBM's design thinking framework. So at the end of my talk I want to also give a shout out to our brilliant IBM design organization. I recommend to link. One is that the design thinking website, right? So IBM has a long tradition of practicing design thinking. It's a set of framework, set of guidelines as well courses to help you and your team to be a design thinker. You don't have to be designer but the goal is to get a team work together to solve complex problem, to do problem solving, to innovate. And we also have a very brilliant IBM design for AI website that really guide you through think about the design problem, think about where the AI can play a role. If you have a problem, if you have an application what are the ethical issue to think about? What are some design guidelines to think about? Again, his plan ability is just one of them. So with that, I like to conclude my talk and thank you for your attention. Thank you very much, Dr. Lau. Once again, I'm going to try and lead a question with you so you can think about it as you remain for the extra session. And one of the questions posed on one of the different channels here was have we really got to explainable AI? It's a hot topic. Are we there? So you can have a candidate response in a couple more minutes later on at the Q&A panel. But I think that's a really great question to unpack as part of the session. So again, thank you, Dr. Lau. I now want to continue this three-speaker session by introducing Dr. Celia Sintas. And Celia is actually here in Nairobi, Kenya. She's part of the team here on the AI Sciences and does a lot of great work trying to better understand machine learning models and GANs in particular. So I'm really happy to introduce Celia for the next 20 minutes, where she's going to share, again, following this transition from the unique viewpoints we had earlier in the seminar, now to a few more technical examples of what IBM's doing in this space. So Celia, please go ahead. Thank you, Sky for the introduction and thanks everyone for joining the seminars today. So in this presentation, I want to discuss with you how can we evaluate the skin tone representation in machine learning solutions for their methodology and how can we enhance the existing models robustness by detecting out-of-distribution samples in off-the-shelf models. But before we start, I'm gonna do a shout-out to all our team that works in all these cool topics that I'm gonna share today. Researchers, machine learning engineers across all different labs and other collaborators. We have master's students from CMU Rwanda here in Higali. And we also have the main experts from Stanford University in dermatology. As Kuzia already mentioned, in the medicine fields, as we can find in the dermatology, we can see these priorities. For example, in African-American populations, melanoma is often diagnosed at an advanced stage. And the survival rates for ALM are 5% higher in Caucasian populations than in African-American patients. Currently, due to COVID, dermatologists started a registry to catalog the skin manifestations of these disease. And currently, they report a low amount of samples for African and Latin patients. Of course, this is already known in the healthcare domain, and there are people working in materials such as Mind the Gap from Malone Mokwende to report clinical science in darker skin tones. But when we are thinking to translate these solutions to machine learning applications, we need to think and add it possible these priorities that could be translated to our solutions or even exacerbate. So in this presentation, I'm gonna go over two key questions. First, studying either the dermatology image datasets that are being used currently in machine learning and see if there are bias with respect to the skin tone. The second question goes around robustness and see if the models that we train for diagnosis are robust against clinical setting changes or unknown disease samples. So there is a lot of work done in the field for machine learning and dermatology. For one hand, we have several state-of-the-art models for skin disease diagnosis. For example, a few years back, there was a benchmark model for melanoma diagnosis that outperforms dermatologists. There is also a large community on ICP challenges when you can access to different datasets and you can work on different challenges like segmentation of lesions past diagnosis tasks and so on. And in the other hand, there is a body of work on fairness in computer visions regarding skin types and gender like the groundbreaking work of Mulwami and Gebru on gender shades and also papers regarding pedestrian detection and their different skin tones. So the idea of this work was to bridge actually these two worlds, take models that were already trained for diagnosing melanoma and analyze how do they perform and how is the data cell built across different skin tones. So the first work was presented by Tim and Newton and Mika on 2020. And actually what they show is a framework to be able to stratify both the performance of the model and the datasets on different skin tones. We will go quickly on each of these boxes and see the results. So for the next couple of slides, we're gonna show some skin disease examples that this could be sensitive or triggering to some of the viewers. So we wanna just notice this and so you can prepare and necessarily engage or disengage on the slides. So for the first work regarding representation skin tone, we work on two very well-known datasets, ISIC 2018 and SD-108. So from one hand, we will have their most topic images for multiple skin conditions. And on the other hand, we have clinical images. These are different body parts. It will have a change of the hardware and how it's taking and the protocol. So we will see that it's a little bit more harder to evaluate and work with. So the first step of the pipeline is being able to differentiate pixels that belong to a lesion and pixels that belong to the other parts of the skin. So for that, we fine-tune a mask and CNN that it was able to give us binary mask as you can see on the bottom right. That will give you the black pixels are the ones that we will use to estimate the skin tone and we will exclude the pixels inside the white area that belong to the lesion that we don't want to consider when we estimate the skin tone. So we use pre-trained models for two reasons. First, to avoid that the model to overfit or to learn all the examples that we show about their mythology. And on the other hand, we only need to train the model just a little bit more so we don't need to have a huge footprint of memory and resources on it. So after we have these pixels, what we need to do is actually have a metric of the skin tone. So we will use the non-lesion area pixels and we will use individual typology angle that is a metric that is highly correlated with melanin index so we can use it as a proxy. After we have this value, we will need to pin them in different categories. Of course that we can adjust this depending on the population and the dermatologist's advice. But the idea that you can have right here is that anything above 45 is she's gonna be very light-toned, skin very complication and anything below 10 is gonna be a darker skin tone. So just to give you some idea on how both the ATA computation looks like and how the segmentation works, you can see the mask on the bottom and the values on the top and you can have an idea of which type of the skin we were looking at. When we move to the second that said, we see that our accuracy drops a little bit and we have a higher error rates when we estimate the ATA calculations because as we mentioned before, the pictures of these data are different body parts, different hardware, there is different protocols marching here but we are currently working on how to deal with these more complex data sets. So when we look at the data sets distribution, we can see that it's highly skewed towards Caucasian and very light skin tones. So we only have a few samples for darker skin tones in both cases. So this is something that we need to be explicit about when we show the models that we train and which type of data they were using. So we are aware which type of population may work or not. So diving into the second part, looking at robustness of these machine learning models. As we see an interest of taking these models that are able to diagnose a given condition in the dermatology space, we need to address these aspects such how robust they are to different changes and how fair they are to the population that they work for. So in this case, we look at two different out-of-distribution scenarios. What happens when the clinical setting changes of different hardware is the one taking the input of the data or we have different illumination settings and so on. On the second case, we have unknown disease classes. What happened if my model was trained under 10 different conditions and then I receive a new one. How that model will be able to predict or at least define how uncertain it is about that new class. So this work is currently being presented very close this week or next week by Hannah, one of our interns on OOD KDD. But the main idea is again, taking the skin disease diagnosis models and known datasets are being able to both still have an idea of stratified the data by the skin tone, but also use those trained models to be able to detect OOD samples. So when we say that we work with substance scanning here, we need to think that we treat the neural network that is diagnosing a particular skin disease as a data generating system and we are applying anomalous pattern detections on top of that. So substance scanning is enabling us to search in this huge combinatorial space in order to find the group of the activations that differs the most from what we expect normal in a given layer of a given network. So some goodies that we can see on this type of approach is that the model was already trained. We don't have access to the code. We only see the model running. We can provide detection improvements at runtime when that model is already on production. As we are looking at the activation space, we can abstract from different domains and our network maybe can process audio, healthcare signal processes or images and the method will still holds because we're working on the activation space. And in the other hand, we don't need to retrain these models to spend extra compute time on them or we don't have labeled examples as we are running when the model was already trained. So when we do our experiments here, our assumption is that the activations from this unknown condition or from different clinical setting samples have different distribution activation than the normal samples. So the idea here is that activations from these unknown images will have a different distribution of this proportion than the normal samples. But let's see how we measure that difference. So for that, we need to score one sample at a given node and we use these scoring functions to measure how much these values deviate from uniform. So with substance scanning, we can use parametric functions like Poisson or Gaussian, but the distributions of each layer can be different, can be by model or can be skewed. So for this case, we use non-parametric statistics. So we have minimal assumption of what the distribution looks like for a given activation in one layer. But how this fits all together on a pipeline. So we have our trained model by diagnosed skin conditions and we have samples of that we know that the model was trained on or they are known or expected. And then we have images that could be or could be not from a new class or could have an issue on how the image was taken. So we need to strike the activations for both cases, the ones that we know to build the expected distribution and also the ones that we don't know to be able to evaluate. After we have these two distributions, we can compute these proportions that we mentioned before. And then we can search for the subset that will maximize these diversions of proportions between what we observed in the new images and what was expected. After we have this set of nodes, we have valuable information. First, we have, we can flag a sample as anomalous that should be checked. And in the other hand, we have the nodes that are responsible to flag the sample out that can give more information to the user. So some preliminary results that we have from this work were very interesting. First, in one hand, the detection patterns that we found for new classes are different that the ones for changes on the hardware or the clinical setting. That's important because we can not only say that something's off with that sample, but we can actually say that looks like the pattern of a new class problem or looks like the pattern of a change on the clinical settings. In the other hand, we also see a varying performance for samples on darker skin tones. And this instability that we say, we see here for darker skin tones may be partially because the training dataset heavily lacks of samples of darker skin tones. So the conclusions that we have here is that the three datasets that we explore in these two different spaces were showing us that they were heavily skewed to Caucasian populations. So there is something to think about the representation issue on these datasets. We can have a cool way to detect OODs when the model is already out. Maybe we were the ones building, but we are the ones using that model and we want to make it more robust. And something that my colleague will talk about next is how can we translate these same pipelines to more complex settings, less curated datasets, having academic documents like papers, textbooks. How can we translate and learn about representation in other settings? So if you're interested in this work, the Kenya Lab has an amazing team working on machine learning and healthcare, maternal and neonatal outcomes and also automatic stratification of data. And also if you find substance scanning interesting, we have other applications in the machine learning domain. Thank you very much for your time and have a happy seminar. Thank you very much Celia. So I have a question for you and we'll answer it at the end of the session. Maybe the same one for you and Girmah as well. You're probably hoping for a technical question. It's not a technical question though. One of the users had commented that in these higher stake settings of medicine or law, where fairness is so important, we also know there's heavy regulation on data privacy. And so I want you guys at the end of this to talk a little bit about the most important cases are also the cases where data is perhaps coveted or protected most heavily. So can you comment at the end about that interplay between the importance of fairness and representation in data sets while balancing privacy and access to data? So that's gonna be a question coming up at the end of our after our final presentation by our last introductory here, Dr. Girmah Tadasi and Girmah also is joining us from the Nairobi lab based here in Kenya. He's been with IBM Africa for about a year and a half, coming on two years and has been doing some great work and better understanding machine learning and how to overplays either with healthcare or with some Africa context specific examples. So Girmah, bring us home for the last 20 minutes before returning for the question and answer session. All right, Sky, thanks for the introduction. I hope you could hear me well. And yes, I'm Girmah and I'm here to present our recent work on representation analysis in academic materials. This work has been done in collaboration with different researchers across different IBM labs and external collaborators. So the outline of the talk is as follows. First, I will comment on the challenges related with learning from data. And when you say data, it's beyond just curated data sets we often use to model our machine learning algorithms. We could have data like traditional academic materials and specifically we will focus on healthcare or dermatology as a follow up to my colleagues presentation earlier. The problem will be formulated in a way. How could we analyze pairs of representation for different skin groups in traditional academic materials? The books are in the slide. For that, I will present our solution and followed by a conclusion. So probably most of us are aware of any data driven pipeline where we have a data source and then a specific model would be used to train that model and outcome would be inferred using the trained model. However, when the data is questionable, for example, when there's lack of representation for a particular sub-population and hence the outcome we could infer from this model becomes questionable as well. So this is like a growing research domain as it has been presented in the previous talks given a data which is curated for machine learning use and how do we identify, detect and imbalance the regularities and there is different streams of work in kind of correcting that detected the regularity that includes retraining, correction, data augmentation and so on. However, the talent becomes even more when the model involving is actually human being. As we all are aware, we have, you know, we have gone through different studies in our academic experiences and careers where we use traditional materials like textbooks, slides, journals and so on as our source of data to understand a specific problem or make our expertise, right? But in this scenario, the detection of pairs is a bit tricky and how to correct is also needs a particular attention because usually even in common AI framework, we say that we have a data probably ideally are notated by domain experts, right? To train our machine learning algorithms. But the question is what if the domain expert is actually bad because of the material they go through? That's the kind of question we are trying to address in this talk. And this is a bit tricky as it's reported in recent New York Times piece, you know, understanding and probably correcting bad in people is probably much more difficult than correcting an algorithm. So why do we specifically pick dermatology? It's just another subdomain where there is a very strong imbalance of representation on different subgroups, specifically the skin tones, like dark and light, dark and light skin tones, for example, in those academic materials. These are the reporters, papers published from the domain experts saying that these textbooks are actually lack of plausible representation from the darker skin examples. So you may ask, what are actually the challenges if you studied the dermatology, a particular domain and if that material lacks fair representation? Actually, it may delay the diagnosis for even lack of export access if they are not trained on enough data for a particular example. For example, an expert may not have enough knowledge how a particular skin tone could manifest itself in different or underprivileged skin categories. This may overall lead to increased morbidity and mortality over all degraded quality of care. So the question we are trying to address in this piece of work is, could we automatically identify the representation of the skin image in dermatology academic materials across skin categories? And of course, the answer is yes, by first given these traditional materials, let's say like 1000 page book of a dermatology, then we could actually in just different entities in a textbook that includes tables, figures, abstracts and texts and so on. Specifically, we go for figures or images in these textbooks and then by analyzing, detecting using of course image processing pipelines, we could actually identify the representation of, for example, light versus dark skin tones in a given textbook. This gives like a real time and insight about the representation of different subgroups for a domain expert, which are learning, practicing or teaching a particular matter specifically dermatology here. So the first step, as I said, was trying to ingest different entities given traditional material. For that, we used an existing IBM's tool called Corpus Conversion Service, a knowledge graph based machine learning platforms that kind of identify different entities in a given textbook or in a given chapter or in a given page actually. For a particular image, as you see in the example, we could identify where that image occurs in a page and exactly the horizontal and vertical coordinates as you see in those examples. By using this information from the output of our CCS tool, we could of course extract images either level of identified in that textbook. Of course, as you may understand, we are focused on understanding the skin image and what is their skin categories in a given textbook. So the first step would be trying to kind of filter out the skin image from the skin image we may extract from the textbook and that's where our skin versus men's skin classification becomes the first sub-task. In doing this, of course, we could always analyze the data, how different they are for skin and men's skin across different skin tones and even within different features like histogram of rented gradient and ITA, individual topology angle which Celia mentioned earlier. Overall, we just start from a simple classification task given four different textbooks. We are working with a domain expert and labeled. First, we just throw a simple support vector machine which is just one class trend on skin image trying to identify anything other than skin but we extended that validation into adding more features like for example, the different intensity values across different color spaces, IT values and more extreme gradient boosted algorithm. You could see we comfortably could identify skin image in a textbook and discard men's skin image. So now there was a textbooks at first we ingest the different entities, we picked those only which are images and then we identified the skin image. The last probably step is trying to identify the type of skin categories they belong to. For example, are they light or dark skin, right? So a bit of warning, the couple of slides coming up we'll have a skin image in higher resolution so we may probably look away if you find that necessary. But coming back to the skin tone estimation before we try to identify the type of skin category we need to make sure that unnecessary partners in those identified skin image are discarded. That includes for example, for ground objectives or in backgrounds which did not correlate with skin pixels and the solution for that as you know is segmentation. When you try to a particular region of interest in a given image there is different methods you could apply. As you know something is like more sophisticated mask RNA type framework to identify skin image but for this work we just go for the simple approach using the color intensity values across different color space and as a follow up image processing techniques like watershed and morphology we could identify skin pixels compared to unnecessary or men's skin related pixels as you see in those figures. Given the original image we try to mask out those unnecessary for ground or background parts they may be closest or other sources and finally we have these selected pixels as you could see which we provided to the follow up machine learning pipeline we employ to identify the type of skin category they belong to. So before we jump into let's do, let's check the ground tools information available in four different dermatology textbooks that are being used in top the medical schools in the US and worldwide. Across these four textbooks what we found consistently is actually several imbalance of representation by darker skins in these textbooks which kind of reconfirms this kind of concern the domain experts were sharing in the introductory slides I just shared with you before. In these across the four textbooks the maximum darker skin image is represented by roughly about 10% where the remaining 90% is about image for lighter skin tone types. But the question now is so far the studies have been a kind of manual, right? But what we are trying to do is could machine learning methods could actually identify those two categories automatically. Yes, we could do so. For that we evaluated two branches of machine learning frameworks. One is like the traditional three baselines like boosting and another one is deep learning specifically resonate framework. As you know, the max we could achieve with traditional or gradient boosting sort of classifiers is about 0.9 accuracy and fiscal as you could see in the highlighted box both these kind of frameworks usually look for handcrafted features you need to execute in code from the given data. However, as probably most of you know when you use deep learning framework, you are not supposed to kind of engineer features from the raw data rather you could just work with the raw data and then the framework could learn discriminated features from the data. The second advantage of this deep learning pipeline is actually what you could transfer learning. The pre-trained resonate, for example, has been trained priorly on natural images using for example, ImageNet. That helps the model to understand specific basic features of Image, which we could also use to classify skin-to-catagories in our data. So as you could see, the end expectedly is the resonate framework kind of outperform the traditional models with much improved performance metrics highlighted in the green box. So I tell you all different pieces now starting from traditional book which is like 1000 chapter for example, ingesting the entities, selecting the image, extracting the skin once, and segment really skin pixels and classifying the skin tones have been done almost separately as you follow in the presentation. What's missing is trying to wrap up all this tool into a standalone tool that everybody could use it outside even from IBM research to understand how representative their textbooks or our dynamic materials are. So a simple conclusion of that is of course we could achieve some kind of encouraging performance in doing so. But the main concern is like the imbalance of data is a critical concern based in traditional materials or curated data sets as we could understand the latest app from which is found to be not good enough for darker skins. So this is just to go back to Trestor's AI and the perspective we have been talking this afternoon that is saying of course rather than collecting more data, more data, it's better to focus on good data and rather than more model based kind of analysis, the growing trend is not to focus more on data centric way in order to make more meaningful and trustworthy AI solutions. As I said earlier, it is very important specifically to look back as a kind of academic materials domain experts could be a trend on which may subconsciously or unconsciously may bad their decision making and this work has been done in a great collaboration as I said with different researchers both in IBM include our keynote speaker Kush from Yorktown and Peter IBM Zurich. We had also great interns, Kenny Andrews and Hannah King in this work. Also we had external collaborators like dermatologists like Roxana and professor James from Stanford University of Penn and Sloan Kettering Cancer Center. So I thank you all for joining this conversation and I hope to see you soon. Thank you. Over to you Scott. All right, thank you Girmah. And so I think you've actually answered partially the question I posed to Celia. All you need is a PhD, lots of textbooks, a huge machine learning pipeline, a panel of experts and that's all you need in order to collect that data. All right, so that's the answer to that question. All right, well, I'm joking a little bit in jest but thank you now. We've all got three recent talks and these had an excellent more technical feel about how IBM has actually been using some of these tools and exploring the disparate impacts of data or how these machine learning models are used. So I wanna come back now to start off with some of the preceded questions that I had. Vera, one of the questions was basically, humans that can't necessarily explain their own thought process at some time. So how do we expect a machine learning algorithm to do so? And are we really at explainable AI? Very interesting question. Yes, it's true that we don't necessarily understand how our brain work with every detail but I think we're doing fine. We are able to move around in our everyday life and one point is we have a lot of communication devices when we talk to each other. When I try to justify my decision, I don't have to tell you everything going on in my mind, right? I can't draw a logical reasoning, I can't just tell certain parts of the courses. That's also kind of my view with explainable AI in general. I have a pretty pragmatic view. I don't necessarily think we need to make every traces of the AI completely transparent. The point is how do we support people to achieve the necessary understanding for a given objective? And this can be achieved by post hoc explanation or example-based explanation. So many ways for us to achieve necessary understanding, we don't have to understand how exactly the brain work and we're doing fine. That's my answer. Thank you. All right, cool. Girmah, a leading question before I bounce to Celia. One of your final slides said good data, not big data. Can you expand on Andrew Yang's meaning? What does it mean by good data? Just sit on that for a few seconds. Celia, all right. The privacy question and how does that impact data availability? Which has been a theme from this because data availability then informs machine learning models then eventually down to the outcomes. So any comment at least of what you've experienced so far working with medical data? So I would say that something that we can leverage in healthcare is that there is already a system in place with IRBs and making sure that the data collections and the analysis that are being done follow an ethical structure, let's say, or guarantee data privacy issues to the patients that donate the data and so on. So I'm hopeful on the angle that healthcare already has all these infrastructure that machine learning needs to adapt and of course there are new rules that are gonna appear because it's not just for data, right? It's like you train the model and you share the weights of that model to somebody else and you have a proxy of data. Of course it's already processed and it's not for accessible data but it's still allowed to think about but I do believe that healthcare already has a good platform regarding privacy and data access for sure. Okay, cool. Great, thank you. Girmah, good data. Good question. Good is very subjective but I think the main point what's getting more popular lately is like more data-centric approach. Is that or maybe a few years ago there was this understanding that if you have collecting more data, if you have big data if you put like bigger frameworks and there's a tendency it could work better and better prediction performance and probably that's not the case specifically when we try to develop solutions that could be impactful. Before we jump even into model and analysis so on first we need to understand the data actually the ground truth, the representation across different subgroups should be something we need to investigate further before we jump into model. Do good is like maybe as I said subjective but it should be reflected enough to the problem you are trying to address the ground truth, the representation of it and so on. So this is in line with also some of the other work we do in auto stratification when understanding the data how it's related different how to come different groups before you jump into the model itself. So the focus and I think the take home message is given a data don't jump into the model analysis and designing big framework just rather look back, analyze and make sure that is the correct data useful or you could use to solve the problem you are trying to address. All right, cool. I think that now ends the preceded questions. There's a few more coming in a bit live and you guys will have to be on your feet. Vera, you mentioned a little bit in your talk a lot about the design thinking aspects. Do you have any more examples where design thinking has come in and actually has been then impacted downstream fairness? I know that's a little bit of your area of research but where outside perhaps of the surveys and stuff that you've done IBM is big on design thinking maybe just a few examples where design thinking has come in and the client has come back and said, oh, wow, I see now the advantage of that. Right, that's a very good question. Again, like the two link I recommended there is a design thinking framework that is also IBM design AI. I think one of the core principle in that set of design thinking framework which is broader than explainability. We have one that is running much longer that kind of get a team to start with thinking about where AI should come in if you have an existing problem, if you have existing product, there's no AI yet and then start from the scratch where the AI come in. I think one core idea of that part of framework that does really well is also to start from what are potential impact? What are potential harm? I think fairness is only one aspect of ethic. It's not just about, okay, we build this model, we need to de-bias the model but it's really important to start from the beginning should we build AI for this particular problem? Is AI making the right decision? It's classification, the right kind of setup. So I think that kind of design thinking framework, design thinking exercise is really good to have this broader question from the beginning, think about potential harm. So that's one example in that area. All right, cool, thank you for that. I am gonna see the question one more with Vera but give her some time to think about it. One of the popular questions on the ask a question interface here is why do small companies struggle working with AI? And I would basically say, what sort of advice from a design thinking point of view would you give, sorry, airplane flying above, what sort of advice would you give to a skeleton crew, perhaps an African startup that wants to use AI but you know that's going to be a problem and many of them fail. So what sort of advice from a design thinking point of view could you give them? And then while Vera is thinking about that, I have a question for the three other members on the panel here, all of us who live and work in Africa. One of the other popular questions is there is an infrastructure and a connectivity divide on the continent and how is that hurting AI progress on the continent and anything that can be done about it? So that was a question that came up about connectivity and infrastructure specifically in Africa and how that now may be inhibiting progress and then what can be done about it? I might jump in and give my take on this particular question while Celia and Girmah ask theirs. I've now been here for almost eight years and visiting since for about 15 and the connectivity divide is real but it is going to be changing whether that is through cellular technology or satellite-based technology. So that digital divide is closing. I think the question that comes from that divide is what are we doing right now as researchers with that time we've been bought because there's going to be another billion people coming online and they are not going to have the experiences and the literacy perhaps that we normally expect with internet use. What are we doing now as researchers to prepare for this last billion people coming on? I'm not going to argue that the connectivity problems aren't there but I assure you they are disappearing and are disappearing rapidly across the continent. And I think that actually makes a more pressing question for trustworthy AI and for data generating or for data gathering and how that's being used is we are going to have new internet users joining from the continent very soon and I think there's some really interesting trustworthy questions that come from that closing of the divide. So that's my take on that particular infrastructure question. Celia Germa, you've been both well versed on the continent. Any thoughts on that? Celia, you wanna go first? Sure. Connectivity-wise, I think there is, so cellular networks, I think at least here in Kenya are wide and you can find in any spot you will have 4G. But then having the access to that cellular data, that's another gap that needs to be think about. Not only connectivity but access to connectivity. Regarding infrastructure, I think I'm from Argentina. So I'm Latina and I did my PhD in National Research Center. So doing a PhD on machine learning with restricted hardware access, that's a few things that I know about. But something that I will think is that that is changing as Germa says, not big data, good data. I think there is a lot of trend of trying to move this model to smaller devices so people can actually reproduce their results. And not only a few big labs can actually run these huge networks and host these harmonious datasets. So I do believe