 Hello, all. Thank you for joining to our day to diversity and inclusion session. This session is going to be on the importance of diversity and inclusion for algorithmic bias condition considerations. And we have with us Dr. Ansgar Kyon, who is who is the IEEE P70003 ethics. Who is the, I'm sorry, who is the IEEE 7003 standards framework chair and also the global ethics and regulatory leader at Ernst and Young. Over to you, Ansgar. Okay, thank you very much, Sukanya. And thank you for the invitation to speak at this conference. It's great to be able to speak at a conference that is reaching out to a wider audience that is often not really included in a lot of the discussions around AI ethics. One of the items that I will be touching on in my talk is the problem of where these conversations are actually taking place and the lack of inclusivity, inclusion in the actual process of thinking around AI ethics and the related standards development. So my talk today is going to focus on algorithmic bias considerations as one particular case of a standard for this. But before getting to that, I'd like to start with a general introduction a bit about the issues around algorithmic bias or lack of diversity inclusion in the AI and technology development in general and provide a bit of a context also as to the types of developments that are happening in the regulatory space related to this. So as I'm sure you're all aware, algorithmic discrimination is an issue that's being discussed increasingly. It's frequently a topic in the news due to the increasing use of AI systems for applications that have real world implications such as sorting of job applicants, doing face recognition as part of policing or those kinds of activities, use in credit assessments and various types of other applications. So with the movement of AI out of the sphere of games such as go in chess into real world applications, the necessity to make sure that the systems are not biased as in the decisions are appropriate to the kinds of stakeholders that are being impacted and are taking into account the relevant kinds of data has become increasingly important. So typical causes of bias are related to a lack of inclusiveness in the kinds of data sets that are being used, but also in the kind of people that are working on these which are ultimately the ones that are deciding what kind of issues need to be considered. So this is just a brief overview of some of those cases and on the top left you'll see Joy Boeime, who made a lot of headlines when she talked about the big failures of face recognition algorithms for instance to even see the presence of a black face and then subsequent issues with their performance. So this is just one of the studies that Joy Boeime and her colleague Timnit Begroud did, the gender shade study. This was basically looking at the types of databases that are being used when building face recognition algorithms. So on the left you see three of these, the adience, the IGBA and a special one that they created, the PBB one. And what it indicates in colors in those bars is the distribution of dark and light faces and also the intersectionality of dark male and dark female versus light male and light female kinds of faces in the training data sets. And what you can clearly see is that the traditional training data sets, the adience, the IGBA, which are used by the industry leaders are heavily skewed in the kinds of data that they have. And as a result their performance is also heavily skewed. And so what they did in the study is they introduced an alternative data set that is balanced across these. And importantly what they showed is that when companies actually made an effort to use this more balanced kind of data set and they generated better performance for the minority population, actually what they got was also a better performance for everybody, including a performance on the majority population, which makes intuitive sense because effectively by making sure that you're introducing a more diverse kind of data set of all types of faces that a system might encounter, you are actually pushing the system to more clearly work on those features in the image that are truly relevant to face distinction. A different case that attracted a lot of attention was the case around the COMPASS system. So COMPASS is a system that is used in the criminal justice process in the US to do assessments around whether or not a person who's been arrested should be able to get bail, pre-trial bail, or also in assessments of whether somebody should be up for parole. And the system is being introduced quite broadly in a fair number of states. The manufacturer of the system has claimed that the system is not racially biased. They've made a lot of efforts to make sure that the accuracy of the system, which is shown on the left in the blue box, is well balanced across both white and black defendants and is equally kind of balanced as human performance would be. However, the ProPublica journalists did a deeper analysis into the way in which the system is performing, not in the cases where it is performing accurately or not even just looking at how often does it make an accurate judgment or an incorrect judgment for someone, but if they look more deeply into what kind of errors does the system actually make when it makes an error. And while then looking at that, so looking at the false positives and false negative rates, they identified that the types of errors that the system makes for black defendants or for white defendants are significantly different. So for black defendants, it would, when it makes an error, it would make an error of judging the defendant to be more likely to be a serial offender and therefore not eligible for bail. Whereas in the case of a white defendant, it would tend towards errors of assuming that the defendant is not a serious criminal. And this was an important piece of discussion that started in 2016, because it further looking into this, especially the fact that the manufacturers could claim that the system was not biased, whereas the journalists were saying it is biased. How could, you know, a system be both biased and not biased at the same time. Deeper looking into this more deeply raises the issues that there are multiple ways in which a system can be fair or not fair can be biased or not biased. And the problem that unless you have very specific conditions in the underlying data, generally speaking, it will be impossible for the system to be fair across all different types of measures. Highlighting that there needs to be a clear discussion as to which of these judgments, which error metric is the most appropriate one to use in a particular kind of use context. Now, it's important to just reflect back a bit on the fact that issues around bias due to failure of diversity in the testing population in the population developing the system. Et cetera. Failures of diversity for technology are not new. They are not purely related to AI kinds of systems. So a classic one being the photo calibration issue with photographic film back in the 1960s and 70s. When Kodak being a major film developer at that time was calibrating their film in order to get the best quality of image, they did so based on the majority population in the US being white faces. So the film is calibrated to best reflect images of white faces. And as a result, it doesn't actually perform well on darker skin tones. And this has led to quite a lot of issues around basically intrinsic discrimination within that kind of a technology. And to a large extent has even translated into the digital era because of simply reusing the same kind of color palettes, even though that is no longer technically an argument for having this kind of an issue with the use of digital technologies. A different case was, which was actually a recent one that attracted various attention was the use of photo sensors to trigger soap dispensers. And the people having noticed that these systems don't actually work with dark skin, at least the initial set that was produced. And again, this is a stupid kind of oversight. Basically, the people who developed the system, when they tested it, does it work? They never bothered to test it with people with darker skin tones. And again, there's no AI involved in this. This is simply a case of having a photo detector and calibrating at which level it should be triggered. But what it indicates is sort of a intrinsic problem of not having a diversity of set of people involved in the development that these kinds of things just didn't occur to the people who were working on this. And finally, as in just one more example, I'm not saying that this has been an exhaustive list. A final example, just something to highlight that the issues of diversity are not purely questions about things like race or gender. But for instance, a huge area of concern when it comes to the internet is actually diversity with relation to age or mental capacity. In that internet services, generally speaking, do not test or ask for what is your age or what is your mental capacity when you engage with them. They treat everybody sort of equally. But what that means is they treat everybody as if everybody is an adult, as if everybody is effectively a middle class, western kind of person. So this has led to concerns, organizations such as the Five Rights Foundation, which is working for the rights of young people online. And the UN are working on trying to highlight this issue and trying to promote the technology companies, internet service providers should address this issue. So the UN Charter on the Rights of Children is currently being reviewed to create a new general comment. So general comments on the UN Charters is basically a way of describing how to interpret the charter in relation to something. So this is a general charter of general comment on the UN CRC regarding children's rights in relation to the digital environment, which is highlighting these types of issues of the digital environment, basically assuming that everybody is adult. So after having slightly reflected on the fact that these issues of arising from lack of diversity in the developer community are not purely related to AI kinds of systems, let's reflect a bit on why is it that with the introduction of AI kinds of systems, this has become such an issue in the news and also something that has attracted so much more attention from regulators. And in a sense, one way to distill this down is what I'm sort of pointing out here, which is that AI systems, the way in which they're currently being applied for automated decision making have a tendency to effectively reduce complex individuals down to simplistic binary stereotypes. So what do I mean with that? So this is just a sort of exaggerated, I mean it's a real example, but it has in the sort of extreme part of the spectrum of application spaces. Was this paper that came out back in 2017 by two academics from a business school who were using an AI system, a machine learning system, trained on faces from dating websites, and to try and see whether there is something in the facial features themselves, so the physiology of the facial features that would indicate whether somebody is heterosexual or homosexual. And so this is sort of a textbook example of where multiple sources of problems can arise. So the first point is just the way in which the problem was phrased. Even if we were to say that it is valid to start to look at whether physiology has an indication for sexual orientation, the way in which the project was framed, it immediately assumed that you are binary, that you can only be homosexual or heterosexual, that there can be nothing in between, whereas sociology tells us that this is basically not the accepted scientific understanding, rather sexuality is on a spectrum. So anyone who is bisexual would not be classified properly in this. Then there's the question about the kinds of data sets that they were using, accessing data from a dating site without getting consent, assuming all kinds of things about the types of responses that people gave in the data set. And then basically just using those, this kind of a still relatively small data set to establish some kind of a stereotype of what a person is and, you know, how that would relate to sexual orientation. So basically, one of the issues is a lot of the machine learning systems that are based that are being used to ingest large data sets of various populations and try to distill those down to certain categories groups and make predictions based on that. In effect, what they're doing is what we would classically call stereotyping. They're saying you are of this type, therefore you will have this kind of behavior without directly engaging with a person. And so this kind of effectively what the technology is being used for is one of the triggers for why this has become such a sensitive area and has drafted so much regulatory concern. Now, diversity, I've mentioned a couple of times that often one of the reasons why these issues actually come into the development process and actually and even end up in the final product is simply because the development team is insufficiently diverse. You know, if you don't have somebody on the team, who is of this minority group, then you're much likely to not notice that you've made an assumption about how people are that does not apply beyond this bubble of people who are in the group. And so this is just some statistics that came out earlier this year in June regarding how is the big tech companies, basically the Silicon Valley kind of companies and how are they performing when it comes to employing more diverse workforce. And so what it shows is that the percentage of black workers in big tech in the US is roughly sort of around the 3.5% and it hasn't really improved much over the years. Apple is doing somewhat better at sort of 8.5%. However, if we compare that to the US census, black and African Americans are 13.4% of the population and then Hispanics are another 15.3% and in the black workers, they might have included Hispanics that's not exactly clear. But so even if we're just assuming only the segment in that in the census is listed as black or African American, the black workers in the tech is about half as much or one third as much as it should be to just even be on par with the demographics of the US. And it is debatable whether being on par with the demographics is actually the thing that you should be aiming for because even if the population is much smaller than that, let's say the Native Americans, which in this census of just 1.3%, that would not, you would still want to have some representatives who can at least think about the types of issues that this kind of population might have because you will still be impacting them. So who's been thinking about these kinds of issues. So the, as I've mentioned, these kinds of problems around bias and discriminatory issues with AI have attracted quite a bit of media attention. A lot of the slides that I showed had snippets from newspapers. So as a result of that we've had a huge proliferation of AI principles ethics principles coming out. This has been sort of the first response to it. A lot of governments have been having parliamentary inquiries, etc, to try to figure out, you know, where does the problem exactly lie. They are, they don't want to move immediately towards something like regulation because there is a concern that moving too fast with regulation would interfere with innovation, and that would cause sort of reduce the ability to compete with neighboring countries. And so the focus currently has been on AI principles. And so here we've seen to some of these, like the European Commissions had a high level expert group working on principles. One of the earliest ones was from more of a academic space, the Asilomar AI principles. A lot of big tech companies have come out with their own set of principles. The IEEE did ethically aligned design, which is a book that goes into a bit more depth, but it's effectively also a discussion around what are the principles, the ethics, the ethical concerns to think about. And the OECD came out with a set of principles, which is particularly important because this is the set of principles that were subsequently adopted by the G20 and have effectively become more or less the standard that most countries are looking to as the basis for national AI strategies, national sort of reflections on AI principles and potentially going forward the basis for AI regulation. Now, the Berkman Klein Institute has done a very nice analysis of the various different AI principles that have come out over the last couple of years. I won't go into the details of this, but basically what the graph shows is that the majority of principles overlap quite heavily on the types of themes that are being reflected in them. Largely, they reflect the same kinds of ethical principles that we also find in things like medical ethics, which to a large extent is good because it means we have a certain general human agreement on these and to a large extent they do follow also principles from the fundamental human rights. Now, AI, as I said, is increasingly at the top of the agenda of most governments of developing nations. So this map just shows in yellow various countries that have in recent times published either AI principles, national AI strategies or related kinds of works indicating that these governments are thinking about whether or not to start introducing regulation. In effect, we're seeing a kind of parallel AI race and we've got the AI technology development race that is sort of led by US and China, but we have also an AI regulation race, where Europe seems to be sort of the front runner at the moment. But what we see importantly also, if we look at this map, and which areas are yellow, is that the majority of this is the global North, so to speak. It's North America, Europe, and Australia in this context is often folded into sort of the Europe kind of category. Plus China, and China being a important mover in this kind of space. So the major policy bodies that are working on AI are, as I mentioned, the European Commission and the white paper on artificial intelligence is an important one. The paper that came out in February and is currently being the feedback on that is currently being processed. This is an important one because it is the first policy paper that is really talked about specifics around how AI might end up being regulated by a large regulatory body like the European Commission. The Chinese government has published its principles around AI governance, which are largely in line with what the OECD's AI principles were, and they are moving quite fast also to try to establish sort of a reference point that the rest of the world might anchor their regulation against. Council of Europe is doing a lot of work as well as the OECD, as I mentioned, and the OECD's work became reflected in the G20. Now, importantly, if we think about all of these policy bodies, again, they are severely dominated by the western way of thinking by the global north OECD is basically a global north kind of organization. It's Europe, North America, Japan, more or less. It has some observer states. India is an observer state. The UNESCO is working on something to give it more of a truly global perspective that would include global south. However, the UNESCO work is not being heavily referred to currently in the discussions. This might change, but at this moment, the main reference point for a lot of governments is the OECD work. So there we have a lack of diversity when it comes to the actual thinking around AI regulation. So what does AI regulation, how do governance frameworks, you know, at a high level, how do they sort of, could you categorize them? So roughly speaking, we can think of six different ways in which government can respond to issues around things like bias or censorship, social discrimination, those kinds of things, which is on the left we have two types of market solutions, which is basically demand side or supply side. This is a free market kind of developments, let the market sort itself out. If you're providing a product that is highly discriminatory and people have an alternative product that they could turn to, the idea is that people would select the less discriminatory one and market forces would lead to an improvement of the systems. In practice, we've seen that this is problematic. The nature of a lot of this technology that is uses has a strong network effects that capture the market has made it very difficult to establish this type of actual competition that would lead to improvement in the products. Then we have in the middle company self organization and branch self organization. So this is basically companies doing internal governance saying we're going to do internal oversight and regulations in order to effectively avoid having to get state intervention. So state intervention is the is the strong law kind of approach where government will say you are allowed, you're not allowed to do this and that will set clear requirements. For instance, the potential that you might need to get prior certification before being allowed to deploy your product and co regulation is sort of in between it is where you have government indicating the types of issues that need to be done, but industry developing sort of the details of that government doing an oversight over it. Now industry standards basically lie in this branch self regulations somewhat in the co regulation space when states decide that they're going to refer to certain standards as part of regulatory requirements, but generally speaking industry standards are part of the self regulation often referred to as soft law kind of space of working. So this is a just a kind of an overview of how various standards operate relative to each other sort of from the national level to in this case I've had I have the European level here and then the international level. And what you can see is my personal bias in this that because I am based in in Europe. A lot of my work reflects the way of operating in Europe. And so what we see at the bottom is every country basically has its own national standards bodies, and they can develop not nation specific types of standards for things. And you will find this in in things such as the shape of your plugs and your sockets for appliances to safety standards around things to to do to other kinds of standards. Then on top of that you can have regional standards bodies in the case of Europe you have sent and sent a like this might not be the case in some, for instance, in North America there isn't really a regional one. And then on top of that you have the international one so this is organizations such like ISO, IEC, IEEE also the ITU are relevant standards bodies when it comes to things like AI related standards development. And at this moment what we see is that there is a focus from the national standards bodies to really pool their work and focus on developing standards at the international level. So a lot of the national standards body work is being focused on developing ISO IEC standards. So ISO IEC currently has a number of working groups that are developing quite a range of standards related to AI and that a lot of these are focusing on technical aspects, including the basic one of terminology, making sure that when you say AI or machine learning that we all agree as to what that means. But also things like robustness governance frameworks and the working group three within the ISO IEC, DTC1 SE42 to create the full title. The working group three within that is focused on trustworthy issues. So that includes questions such as bias and societal impacts. IEEE is working sort of in a sense it's a bit parallel to the system in that a national standards bodies aren't directly reflected in IEEE work, IEEE participation is mostly either directly through individuals or sometimes through companies, depending on what particular type of IEEE standard it is, but IEEE launched a whole series of standards focusing on AI ethics, the P7000 series, and I'll talk about that in a minute or so. So just to highlight a couple of the standards developments that are currently ongoing that are specifically dealing with diversity inclusion type of issues. So within the IEEE P7000 series, we have the P7000 itself, the model process for addressing ethical concerns during system design. So this is a standard that is focusing on really how can you introduce considerations around ethics into a system design process. How do you integrate within the flow of the development of an AI system, the stage of reflection around ethical concerns. P70003 is the algorithmic bias considerations working group and I'll talk about that in more detail. P700010, which is the only one of these that has already been published, is focused on recommend a practice for assessing the impact of autonomous intelligence systems for human well-being. So this is trying to provide some guidance around how to think about how an AI system might impact human well-being in the more broader sense. And this one is included here specifically because the development, the working group for this made a specific effort to include also other philosophical perspectives beyond the western Christian kind of one that is often implicitly dominated within these kind of standards developments because participants in these standards working groups. Are frequently primarily from the west. So they made sure to include also Shinto and Ubuntu and various other philosophies as a basis of thinking about ethical or impact on human well-being. And so, and within the ISO IEC, as I mentioned, there's the trustworthiness working group, and most specifically there's the bias in AI systems and AI aided decision making standard that has come out. And so this is also a strong contributor to in the kind of space around thinking on diversity and inclusion. So IEEE, its slogan is advancing technology for humanity, which is what drove IEEE to think about the need to really introduce work specifically reflecting on ethics of AI kind of systems. And in the card below you see it's sort of the official designation around the P70003 standard. So the global, so this work around AI ethics is basically it's part of what's called the IEEE Global Initiative on Ethics of Autonomous Intelligence Systems. It currently has 14 standards under development. It was a bit too much for me to list them all here. And also part of this global initiative is the ethically designed design work. There's also some work happening on certification regarding ethical, ethics certification for AI systems and some work going on around education. So within, so I'm now switching to thinking about specifically the P70003 algorithmic bias consideration standard. So an important sort of starting point within our way of thinking is that algorithmic systems are socio-technical. And it's important to keep in mind that even though we are building a technical system, a machine, ultimately it is being built by people within an organization. And they all live within a society that has political, legal and cultural context. And all of these things are inherently taken with us when we engage with our work and shape the way in which our decision making happens. And so these things need to be considered, especially to when thinking about the types of stakeholders that are going to be impacted and whether or not we have properly recognized their perspectives. So algorithmic bias considerations, what do we mean actually with bias? So effectively what we mean is minimizing bias that is unintended, unjustified and unacceptable. This is the statement just because depending on how you look at it, bias is basically the fact that the outcome is not random and not uniformly distributed. And often in a sense bias is the thing that you're trying to achieve. But the point is whether or not you are directing the outcomes in a way that is justified and intended. So key causes of algorithmic bias tend to be rooted in insufficient understanding of the context of use. So this includes of understanding who it is that will actually be impacted by the system. Who is the audience, the users who will be having to use the system or will be impacted. A clear example of this type of failure to understand context has been issues like social media content filtering that hasn't understood different languages, hasn't understood different cultural sensitivities, and therefore has been basically applying a uniform US centric perception on what is appropriate content leading to various problems in different countries. Another one is failure to regularizing map decision criteria. So what do I mean with that? So this is basically during the development, a lot of decisions are being made and some of these decisions may be unconscious decisions. They're decisions, for instance, of not including the fact that you needed to think about type of exception condition. There are assumptions that are being made about how the system is going to be used, which may not reflect the way in which it will ultimately be used because other people are approaching it from a different kind of perspective. And then finally, failure to explicitly to have explicit justifications for the chosen criteria. So this explicit justifications is something that we focus on. Because this is really a way of making sure that you have that rigorous decision making, and it is also an important element of communicating the ways the decisions were being made and making sure that you have a justification. So a way of explaining why you chose one type of fairness over another type. Why did you choose fairness of process, treating everybody in the same way, as opposed to fairness of outcome, making sure that everybody gets an equal type of an outcome, for instance. This is just a quick overview of the type of elements that are going into this kind of standard and I just want to highlight that we include specifically an informative section of thinking about cultural aspects. How do different cultural backgrounds have an input on the way in which the system might be used, the way in which outcomes might be perceived and experienced, the way in which the cultural background of the people involved in the development might influence and might bias the systems performance. Stakeholder identification very important. If you don't really understand who the stakeholders are who the people are who are going to be impacted by it. You will not be able to properly make sure that you have things like a representative data asset for these risk and impacts assessment, supporting this kind of work. Representative data assurance. Clearly, a lot of the discussion around issues with bias in AI systems has focused on bias in the data sets and so an evaluation of that needs to be part of any work to make sure that you've considered bias issues in your system. And system evaluation, you know, are the outcomes biased tests against that. One of the questions that effectively come out will be is that need to be asked or who will be affected. What are the decision optimization criteria that you've actually implemented into your system. How are these criteria justified. Are these justifications acceptable in the context where the system is used. I've hinted at this already one of the problems is with the whole space of work that is currently happening around AI ethics is a lack of diversity in the community that is actually working in this space. If you access this map by Berkman client center for instance on their website and you zoom in on it you will find that along the circle on the outside of circle where the different sources of principles are listed here you will find predominantly European countries and all or principles coming from the United States is the occasional South American national AI strategy, but as far as ethical key principles work that is being published. They are predominantly European North American including Canada's Canada's an important player in this, as well as a couple from China, but the rest of the world is mostly missing in this discussion. And that's problematic. Another thing is true when we look at the way in which standards are being developed standards development, such as ISO for instance or international standard. However, if you look at who's actually actively participating in this. They are heavily dominated by the global north, plus China, China being a strong mover in the recent years pushing quite heavily. And we actually now find that the Western countries are very concerned about the way in which China is pushing its perspective, its particular ideas for standards in these communities. But we may we could argue that that is exactly the kind of work that that thing that Western Europe and North America have previously been doing. You know, global south is generally speaking, not really at the table when it comes to developing the standards. Also, very much the standards of development is heavily dominated by industry players. They have vested interest, they want these standards, because they are going to be the ones using these standards. But what that means is if they are the only ones at the table concerns from civil society will not be will not be discussed will not be included in the way in which the standards are being developed. And also, the people at the table are primarily men, which basically comes from the fact that most of them are industry people, and in that industry is heavily male dominated. And that is my presentation. I'm now open for questions. It was a completely new topic on this and thank you for presenting this topic at this conference.