 I'm Rachel Thomas, I am the founding director of the Center for Applied Data Ethics at the University of San Francisco and also co-founder of Fast AI together with Jeremy Howard. My background, I have a PhD in math and worked as a data scientist and software engineer in the tech industry and then have been working at USF on Fast AI for the past four years now. So ethics issues are in the news. These articles I think are all from this fall kind of showing up at this intersection of how technology is impacting our world in many kind of increasingly powerful ways, many of which really raise concerns. And I want to start by talking about three cases that I hope everyone working in technology knows about and is on the lookout for. So even if you only watch five minutes of this video, these are kind of the three cases I want you to see. And one is feedback loops. And so feedback loops can occur whenever your model is controlling the next round of data you get. So the data that's returned quickly becomes flawed by the software itself and this can show up in many places. One example is with recommendation systems and so recommendation systems are ostensibly about predicting what content the user will like, but they're also determining what content the user is even exposed to and helping determine what has a chance of becoming popular. And so YouTube has gotten a lot of tension about this for kind of highly recommending many conspiracy theories, many kind of very damaging conspiracy theories. There is also, they've kind of put together recommendations of pedophilia picked out of what were kind of innocent home movies, but when are kind of strung together, ones that happen to have young girls in bathing suits or in their pajamas. So there's some really, really concerning results and this is not something that anybody intended and we'll talk about this more later. I think particularly for many of us coming from a science background, we're often used to thinking of like, oh, you know, like we observe the data. But really whenever you're building products that interact with the real world, you're also kind of controlling what the data looks like. Second case study I want everyone to know about comes from software that's used to determine poor people's health benefits. It's used in over half of the 50 states and the Verge did an investigation on what happened when it was rolled out in Arkansas. And what happened is there was a bug in the software implementation that incorrectly cut coverage for people with cerebral palsy or diabetes, including Tammy Dobbs who's pictured here and was interviewed in the article. And so these are people that really needed this healthcare and it was erroneously cut due to this bug. And so they were really, and they couldn't get any sort of explanation and there was no appeals or recourse process in place. And eventually this all came out through a lengthy court case, but it's something where it caused a lot of suffering in the meantime. And so it's really important to implement systems with a way to identify and address mistakes and to do that quickly and in a way that hopefully minimizes damage, because we all know software can have bugs, our code can behave in unexpected ways, and we need to be prepared for that. I wrote more about this idea in a post two years ago, what HBR gets wrong about algorithms and bias. And then third case study that everyone should know about. So this is Latanya Sweeney, who's director of the data privacy lab at Harvard. She has a PhD in computer science. And she noticed several years ago that when you Google her name, you would get these ads saying Latanya Sweeney arrested, implying that she has a criminal record. She's the only Latanya Sweeney and she has never been arrested. She paid $50 to the background check company and confirmed that she's never been arrested. She tried Googling some other names and she noticed, for example, Kristen Lindquist got much more neutral ads that just say we found Kristen Lindquist, even though Kristen Lindquist has been arrested three times. And so being a computer scientist, Dr. Sweeney studied this very systematically. She looked at over 2,000 names and found that this pattern held in which disproportionately African American names were getting these ads suggesting that the person had a criminal record regardless of whether they did. And traditionally European American or white names were getting more neutral ads. And this problem of kind of bias in advertising shows up a ton. Advertising is kind of the profit model for most of the major tech platforms. And it kind of continues to pop up in high impact ways. Just last year there was research showing how Facebook's ad system discriminates even when the person placing the ad is not trying to do so. So for instance, the same housing ad, exact same text. If you change the photo between a white family and a black family, it's served to very different audiences. So this is something that can really impact people when they're looking for housing, when they're applying for jobs, and is a definite area of concern. So now I want to kind of step back and ask why, why does this matter? And so a very kind of extreme, extreme example. It's just that data collection has played a pivotal role in several genocides, including the Holocaust. And so this is a photo of Adolf Hitler meeting with the CEO of IBM at the time. I think this photo was taken in 1937. And IBM continued to partner with the Nazis kind of long past when many other companies broke their ties. They produced computers that were used in concentration camps to code whether people were Jewish, how they were executed. And this is also different from now where you might sell somebody a computer and then never hear from them again. These machines required a lot of maintenance and kind of ongoing relationship with vendors to kind of upkeep and repair them. It's something that a Swiss judge ruled. It does not seem unreasonable to deduce that IBM's technical assistance facilitated the task of the Nazis and the commission of their crimes against humanity acts also involving accountancy and classification by IBM machines and utilized in the concentration camps themselves. I'm told that they haven't gotten around to apologizing yet. I guess they've been busy. Terrible too, yeah. Okay. Yeah, and so this is a very kind of a very sobering example, but I think it's important to keep in mind kind of what can go wrong and how technology can be used for harm, for very, very terrible harm. And so this just kind of raises the questions that we all need to grapple with of how would you feel if you discovered that you had been part of a system that ended up hurting society? Would you even know? Would you be open to finding out kind of how things you had built may have been harmful? And how can you help make sure this doesn't happen? And so I think these are questions that we all need to grapple with. It's also important to think about unintended consequences, how your tech could be used or misused, whether that's by harassers, by authoritarian governments, for propaganda or disinformation. And then on a kind of a more concrete level, you could even end up in jail. And so there was a Volkswagen engineer who got prison time for his role in the diesel cheating case. So if you remember, this is where Volkswagen was cheating on emissions test and one of the kind of programmers that was a part of that. And that person was just following orders from what their boss told them to do. But that is not a good excuse for doing something that's unethical and so something to be aware of. So ethics is the discipline dealing with what's good and bad. It's a set of moral principles. It's not a set of answers, but it's kind of learning what sort of questions to ask and even how to weigh these decisions. And I'll say some more about kind of ethical foundations and different ethical philosophies later on in this lesson. But first I'm going to kind of start with some use cases. Ethics is not the same as religion, laws, social norms or feelings, although it does have overlap with all these things. It's not a fixed set of rules. It's well-founded standards of right and wrong. And this is something where clearly not everybody agrees on the ethical action in every case, but that doesn't mean that kind of anything goes or that all actions are considered equally ethical. There are many things that are widely agreed upon and there are kind of philosophical underpinnings for kind of making these decisions. And ethics is also the ongoing study and development of our ethical standards. It's a kind of never-ending process of learning to kind of practice our ethical wisdom. And I'm going to refer it several times too. So here I'm referring to a few articles from the Markula Center for Tech Ethics at Santa Clara University. In particular, the work of Shannon Valer, Brian Green and Irina Reku is fantastic and they have a lot of resources, some of which I'll circle back to later in this talk. I spent years of my life studying ethics. It was my major at university and so much time on the question of what is ethics. I think my takeaway from that is studying the philosophy of ethics was not particularly helpful in learning about ethics. Yes, and I will try to keep this kind of very applied and very practical, also very kind of tech industry specific of what do you need in terms of applied ethics. Yeah, and Markula said it's great. Somehow they take stuff that I thought was super dry and turn it into useful checklists and things. I did want to note this was really neat. So Casey Feisler is a professor at University of Colorado that I really admire. And she created a crowdsourced spreadsheet of Tech Ethics syllabi. This was maybe two years ago and got over 200 syllabi entered into this crowdsourced spreadsheet. And then she did a meta analysis on them of kind of looking at all sorts of aspects of the syllabi and what's being taught and how it's being taught and published a paper on it. What do we teach when we teach Tech Ethics? And a few interesting things about it is it raises there are a lot of ongoing discussions and lack of agreement on how to how to best teach Tech Ethics. Should it be a standalone course versus worked into every course in the curriculum? Who should teach it? A computer scientist, a philosopher or sociologist? And she analyzed for the syllabi what was the course home and the instructor home. And you can see that the instructors came from a range of courses including a range of disciplines. Computer science, information science, philosophy, science and tech studies, engineering, law, math, business. What topics to cover? A huge range of topics that can be covered including law and policy, privacy and surveillance, inequality, justice and human rights, environmental impact, AI and robots, professional ethics, work and labor, cybersecurity, the list goes on and on. And so this is clearly more than can be covered in any even a full semester length course and certainly not in kind of a single lecture. What learning outcomes? This is an area where there's a little bit more agreement. We're kind of the number one skill that courses were trying to teach was critique, followed by spotting issues, making arguments. So a lot of this is just even learning to spot what the issues are and how to critically evaluate kind of a piece of technology or design proposal to see what could go wrong and what the risks could be. All right. So we're going to go through kind of a few different core topics. And as I suggested, this is going to be a kind of extreme subset of what could be covered. We've tried to pick things that we think are very important and high impact. So one is recourse and accountability. So I already shared this example earlier of the system that just determining poor people's health care benefits, having a bug. And something that was kind of terrible about this is nobody took responsibility even once the bug was found. So the creator of the algorithm was interviewed and asked, they asked him, you know, should people be able to get an explanation for why their benefits have been cut? And he gave this very callous answer of, you know, yeah, they probably should, but I should probably dust under my bed, you know, like who's going to do that, which is very callous. And then he ended up blaming the policymakers for how they had rolled out the algorithm. The policymakers, you know, could blame the software engineers that implemented it. And so there was a lot of passing the buck here. Dana Boyd has said that, you know, it's always been a challenge for bureaucracy to assign responsibility or bureaucracy is used to evade responsibility. And today's algorithmic systems are often extending bureaucracy. A couple of questions and comments about cultural context. Nalini notes that there didn't seem to be any mention of cultural contexts for ethics as part of those syllabi. And somebody else was asking, how do you deal? You know, is this culturally dependent and how do you deal with that? It is culturally dependent. I will mention this briefly later on. So I'm going to share three different ethical philosophies that are kind of from the West. And we'll talk just briefly of one slide on, for instance, right now there are a number of Indigenous data sovereignty movements. And I know the Maori data sovereignty movement has been particularly active, but different, yeah, different cultures do have different views on ethics. And I think that the cultural context is incredibly important. And we will not get into it tonight, but there's also kind of a growing field of algorithmic colonialism and kind of studying what are some of the issues when you have technologies built in one particular country and culture being implemented, you know, halfway across the world in very different cultural context, often with little to no input from people living in that culture. Although I do want to say that there are things that are widely, although not universally agreed on. And so, for instance, the Universal Declaration on Human Rights, despite the name it is not universally accepted, but many, many different countries have accepted that as a human rights framework and as those being fundamental rights. And so there are kind of principles that are often held cross-culturally, although, yeah, it's rare for something probably to be truly, truly universal. So we're turning to this topic of kind of accountability and recourse. So the thing to keep in mind is that data contains errors. So there was a database used in California that's tracking supposedly gang members. And an auditor found that there were 42 babies under the age of one who had been entered into this database. And something concerning about the database is that it's basically never updated. I mean, people are added, but they're not removed. And so once you're in there, you're in there. And 28 of those babies were marked as having admitted to being gang members. And so keep in mind that this is just a really obvious example of the error, but how many other kind of totally wrong entries are there? Another example of data containing errors involves the three credit bureaus in the United States. The FTC's large-scale study of credit reports found that 26% had at least one mistake in their files and 5% had errors that could be devastating. And this is the headline of an article that was written by a public radio reporter who went to get an apartment. And the landlord called him back afterwards and said, you know, your background check showed up that you had firearms convictions. And this person did not have any firearms convictions. And it's something where in most cases the landlord would probably not even tell you and let you know that's why you weren't getting the apartment. And so this guy looked into it. I should note that this guy was white, which I'm sure helped him in getting the benefit of the doubt and found this error. And he made dozens of calls and could not get it fixed until he told them that he was a reporter and that he was going to be writing about it, which is something that most of us would not be able to do. But it was even once he had pinpointed the error and he had to, you know, talk to the, you know, like county clerk in the place he used to live. It was still a very difficult process to get it updated. And this can have a huge, huge impact on people's lives. There's also the issue of when technology is used in ways that the creators may not have intended. So for instance, with facial recognition, it is pretty much entirely being developed for adults. Yet NYPD is putting the photos of children as young as age 11 into into databases and we know the error rates are higher. This is not how it was developed. So this is this is a serious, serious concern. And there are a number of kind of misuses. The Georgetown Center for Privacy and Technology, which is fantastic. You should definitely be following them. Did a report garbage in garbage out looking at how police were using facial recognition in practice and they found some really concerning examples. For instance, in one case NYPD had a photo of a suspect and they it wasn't returning any matches and they said, well, this person kind of looks like Woody Harrelson. So then they Googled the actor Woody Harrelson and put his face into the facial recognition and use that to generate leads. And this is clearly not the correct use at all, but it's it's a way that it's being it's being used. And so there's kind of total lack of accountability here. And then another kind of study of cases in all 50 states of police officers kind of abusing confidential databases to look up ex-romantic partners or to look up activist. And so, you know, here this is not necessarily an error in the data, although that can be present as well, but kind of keeping in mind how it can be misused by the users. Alright, so next topic is feedback loops and metrics. And so I talked a bit about feedback loops in the beginning as kind of one of one of the three key use cases. And so this is the topic I wrote a blog post about this fall, the problem with metrics is a big problem for AI. And then together with David Uminsky, who's director of the data Institute expanded this into a paper, Reliance on Metrics is a fundamental challenge for AI. And this was accepted to the ethics and data science conference. But overemphasizing metrics can lead to a number of problems, including manipulation, gaming, myopic focus on short term goals because it's easier to track short term quantities and expected negative consequences and much of AI and machine learning centers on optimizing a metric. This is kind of both, you know, the strength of machine learning is it's gotten really, really good at optimizing metrics. But I think this is also kind of inherently a weakness or a limitation. I'm going to give a few examples, and this can happen even not just in machine learning kind of but in analog examples as well. So this is from a study of when English is England's public health system implemented a lot more targets around numbers in the early 2000s. And the study was called what's what's measured is what matters. And so they found so one of the targets was around reducing ER wait times, which seems like a good goal. However, this led to canceling scheduled operations to draft extra staff into the ER. So if they felt like there were too many people in the ER, they would just start canceling operations so they could get more doctors requiring patients to wait in queues of ambulances because time waiting in ambulance didn't count towards your ER wait time. Turning stretchers into beds by putting them in hallways. And there were also big discrepancies in the numbers reported by hospitals versus by patients. And so if you ask the hospital on average, how long are people waiting? You get a very different answer than when you were asking the patients, how long did you have to wait? Another another example is of essay grading software. And so this essay reading software I believe is being used in 22 states now in the United States. Yes, 20 states and it tends to focus on metrics like sentence length vocabulary spelling subject verb agreement because these are the things that we know how to measure and how to measure with a computer. But it can evaluate things like creativity or novelty. However, gibberish essays with lots of sophisticated words score well and there are even examples of people creating computer programs to generate these kind of gibberish sophisticated essays and then they're graded by this other computer program and highly rated. There's also bias in this essays by African American students received lower grades from the computer than from expert human graders. And essays by students from mainland China received higher scores from the computer than from expert human graders and the authors of the study thought that they this this result suggests they may be using chunks of pre memorized text that score well. And this is these are just kind of two examples I have a bunch more in the blog post and even more in the paper of ways that metrics can invite manipulation and gaming whenever they're they're given a lot of emphasis. And this is a good hearts laws kind of a law that a lot of people talk about and it's this idea that the the more you rely on a metric the kind of the less reliable it becomes. So returning to this example of feedback loops and recommendation systems. Guillaume has lot is a former Google slash YouTube engineer YouTube is owned by Google and he wrote a really great plat post and he's done a ton to raise awareness about this issue and founded the nonprofit. I'll go transparency which kind of externally tries to monitor YouTube's recommendations. He partnered with the Guardian in the Wall Street Journal to do investigations. But he wrote a post around how kind of in the in the earlier days, the recommendation system was designed to maximize watch time. And so and this is this is something else that's often going on with metrics is that any metric is just a proxy for what you truly care about. And so here in the team at Google was saying, well, you know, if you're watching more YouTube it signals to test that they're happier. However, this also ends up incentivizing content that tells you the rest of the media is lying because kind of believing that everybody else is lying will encourage you to spend more time on a particular platform. So Guillaume wrote a great post about this kind of mechanism that's at play and you know this is not just YouTube. This is any recommendation system could, I think be susceptible to this and there has been a lot of talk about kind of issues with many recommendation systems across platforms. But it is it is something to be mindful of and something that the kind of creators of this did not anticipate. And then last year, Guillaume kind of gathered this data on. So here the x axis is the number of channels, number of YouTube channels recommending a video, and the y axis is the log of the views. And we see this extreme outlier which was Russia today take Russia today's take on the Mueller report. And that Guillaume observed and then was picked up by the Washington Post. But this strongly suggests that Russia today has perhaps gained the recommendation algorithm, which is which is not surprising. And it's something that I think many content creators are conscious of and trying to, you know, experiment and see what what gets more heavily recommended, and thus more views. So it's also important to note that our online environments are designed to be addictive. And so when kind of what we click on is often used as a proxy of of what we enjoy or what we like. That's not necessarily though for of our kind of like our best selves or our higher selves. It's, you know, it's what we're clicking on in this kind of highly addictive environment that's often appealing to some of our kind of lower instincts. Saying up to fact she uses the analogy of a of a cafeteria that's kind of shoving salty sugary fatty foods in our faces and then learning that hey, people really like salty sugary fatty foods, which I think most of us do in a kind of very primal way. But we often, you know, kind of our higher self is like, oh, I don't want to be eating junk food all the time. And online, we often kind of don't have great mechanisms to say, you know, like, oh, I really want to read like more long form articles that took months to research and are going to take a long time to digest. While we may want to do that, our online environments are not not always conducive to it. Yes. So if I make a comment about the false sense of security argument, which is very relevant to asks and things, did you have anything to say about this false sense of security argument? Can you say more? It's a common feedback at the moment that people shouldn't wear masks because it makes a sense of security. That kind of makes sense to you from an ethical point of view. No, that is, I don't think that's a good argument at all. In general, there's so many other people including Jeremy have pointed this out. There's so many actions we take to make our lives safer, whether that's wearing seatbelts or wearing helmets when biking, practicing safe sex, like all sorts of things where we really want to maximize our safety. And so I think, and Zaynip had a great thread on this today of it's not that there can never be any sort of impact in which people have a false sense of security, but it is something that you would really want to be gathering data on and build a strong case around and not just assume it's going to happen. And that in most cases people can think of, even if that is a small second order effect, the effect of doing something that increases safety tends to have a much larger impact on actually increasing safety. Do you have anything to add to that? Yeah, as I mentioned before, a lot of our incentives are focused on short term metrics, long term things are much harder to measure and often involve kind of complex relationships. And then the fundamental business model of most of the tech companies is around manipulating people's behavior and monopolizing their time. And these things I don't think an advertising is inherently bad, but they, I think it can be negative when taken to an extreme. There's a great essay by James Gremelman, the platform is the message and he points out these platforms are structurally at war with themselves. The same characteristics that make outrageous and offensive content unacceptable are what make it go viral in the first place. And so there's this kind of real tension here in which often things that kind of can make content really offensive or unacceptable to us are also what are kind of fueling their popularity and being promoted in many cases. And this is an interesting essay because he does this like really in depth dive on the Tide Pod Challenge, which was this meme around eating Tide Pods, which are poisonous, do not eat them. And he really analyzes it though. It's a great look at meme culture, which is very common and how he kind of argues there's probably no example of someone talking about the Tide Pod Challenge that isn't partially ironic, which is common in memes that even kind of whatever you're saying, they're kind of layers of irony and different groups are interpreting them differently and that even when you try to counteract them, you're still promoting them. So with the Tide Pod Challenge, a lot of like celebrities were telling people don't eat Tide Pods, but that was also then kind of perpetuating the popularity of this meme. So this is an essay I would recommend that I think is pretty insightful. And so this is, we'll get to disinformation shortly, but the major tech platforms often incentivize and promote disinformation. And this is unintentional, but it is somewhat built into their design and architecture, their recommendation systems and ultimately their business models. And then on the topic of metrics, I just want to bring up, so there's this idea of blitz scaling and the premise is that if a company grows big enough and fast enough, profits will eventually follow. It prioritizes speed over efficiency and risks potentially disastrous defeat. And Tim O'Reilly wrote a really great article last year talking about many of the problems with this approach, which I would say is incredibly widespread and is I would say the kind of fundamental model underlying a lot of venture capital. And in it though, investors kind of end up anointing winners as opposed to market forces. It tends to lend itself towards creating monopolies and duopolies. It's bad for founders and people end up kind of spreading themselves too thin. So there are a number of significant downsides to this. Why am I bringing this up in an ethics lesson when we're talking about metrics. But hockey stick growth requires automation and a reliance on metrics. Also prioritizing speed above all else doesn't leave time to reflect on ethics and that is something that's hard that I think you do often have to kind of pause to think about ethics. And that following this model when you do have a problem, it's often going to show up on a huge scale if you've scaled very quickly. So I think this is something to at least be aware of. So one person asks about is there a dichotomy between AI ethics, which seems like a very first world problem, and wars, poverty, environmental exploitation has been kind of a different level of problem, I guess. And there's an answer here which is something else maybe you can comment on whether you agree or have anything to add, which is that AI ethics they're saying is very important also for other parts of the world, particularly in areas with high cell phone usage. For example, many countries in Africa have high cell penetration. People get their news from Facebook and WhatsApp and YouTube and though it's useful, it's been the source of many problems. Do you have any comments on kind of. Yeah, so I think the first question so AI ethics, as I noted earlier, and I'm using the phrase data ethics here but it's this very broad and it refers to a lot of things. I think if people are talking about the, you know, in the future can computers achieve sentience and what are the ethics around that. And that is not my focus at all. I am very much focused on and this is our mission with the Center for Applied Data Ethics at the University of San Francisco is kind of how are people being harmed now what are the most immediate harms. And so in that sense, I don't think that data ethics has to be a first world or kind of futuristic issue it's it's what's happening now. And yeah and as the person said in a few examples. Well one example I'll get to later is definitely the genocide in Myanmar in which the Muslim minority the Rohingya are experiencing genocide. The UN has ruled that Facebook played a determining role in that, which is really intense and terrible. And so I think that's an example of technology, yeah, leading to very real harm now. So yeah, what's that which is owned by owned by Facebook there have been issues with people spreading disinformation and rumors and it's led to several lynching dozens of lynchings in India of people kind of spreading these false rumors of oh there's a kidnap or coming around and in these kind of small remote villages and then a visitor or stranger shows up and gets killed. What's that also played a very important role or bad role in the election of Bolsonaro in Brazil election of where to take in the Philippines. So I think technology is having a kind of very immediate impact on on people, and that those are the types of ethical questions I'm really interested in, and that I hope I hope you are interested in as well. Do you have anything else to say about that or, and I will I will talk about disinformation I realized those were kind of some disinformation focus and I'm going to talk about bias first I think it's bias and disinformation. Yes. Question. When we talk about ethics, how much of this is intentional unethical behavior. I see a lot of the examples as more of competent behavior or bad modeling where the product or models are rushed without sufficient testing thought around bias so forth but not necessarily Yeah, I know I agree with that I think that most of this is unintentional. I do think there's a often though. Well, we'll get into some cases I think that I think that in many cases the profit incentives are misaligned and I do think that when people are earning a lot of money it is very hard to Consider actions that would reduce their profits, even if they would prevent harm and increase kind of ethics. And so I think that you know there's at some point where Valuing profit over how people are being harmed is You know when does when does that become intentional is is you know a question to debate but I you know I don't I don't think people are setting out to say like I want to cause a genocide or I want to help a authoritarian leader get elected Most people are not are not starting with that but I think sometimes it's a carelessness and a thoughtlessness but that I I do think we are responsible for that and we're responsible to kind of be more careful and more thoughtful and how we approach things. Alright so bias so bias I think is a an issue that's probably gotten a lot of attention which is great and I want to get a little bit more in depth because sometimes discussions on bias stay a bit superficial. There was a great paper by Harini Suresh and John Gutag last year that looked at kind of came with this taxonomy of different types of bias and how they had kind of different sources in the machine learning kind of pipeline. And it was really helpful because you know different sources have different causes and they also require different different approaches for addressing that. Harini wrote a blog post version of the paper as well which I love when researchers do that I hope more of you if you're writing an academic paper also write the blog post version and I'm just going to go through a few of these types. So one is representation bias and so I would imagine many of you have heard of Joy Balamwini's work which is rightly received a lot of publicity in gender shades. She and Timnit Gebru investigated commercial computer vision products from Microsoft IBM and face plus plus and then Joy Balamwini and Deb Raji did a follow-up study that looked at Amazon and Keros and several other companies. And the typical results they kind of found basically everywhere was that these products performed significantly worse on dark skinned women. So they were kind of doing worse on people with darker skin compared to lighter skin worse on women than on men and then the kind of the intersection of that dark skinned women had these very high air rates. And so one example is IBM. Their product was 99.7% accurate on light skinned men and only 65% accurate on dark skinned women. And again this is a commercial computer vision product that was released. Question. There's a question from the Tremol study group. The Volkswagen example. In many cases it's management that drives and rewards an ethical behavior. What can an individual engineer do in a case like this especially in a place like Silicon Valley where people move companies so often. Yeah so I think I think that's a great point. Yeah and that is an example where I would have I would have much rather seen people that were higher ranking doing jail time about this because I think that they were they were driving that. I think that yeah it's great to remember that I know many people in the world don't have this option but I think for many of us working in tech particularly in Silicon Valley we tend to have a lot of options and often more options than we realize. Like I talked to people frequently that feel trapped in their jobs even though you know they're a software engineer in Silicon Valley and so many companies are hiring. And so I think it is important to use that leverage. I think a lot of the kind of employee organizing movements are very promising and that can be useful but really trying to kind of vet the ethics of the company you're joining and also being willing to walk away if you if if you're able to do so. That's a great great question. So this is this example of representation bias here the kind of way to address this is to build a more representative data set. It's very important to keep consent in mind of the people if you're using pictures of people but Joy Balamini and Timnit Gebru did this as part of as part of gender shades. However this is the fact that this was a problem not just for one company but basically kind of every company they looked at was due to this underlying problem which is that in machine learning benchmark data sets spur on a lot of research. However kind of several years ago all the kind of popular facial data sets were primarily of light skinned men for instance IGBA kind of popular face data set several years ago only 4% of the images were of dark skinned women. Yes. Question I've been worried about COVID-19 contact tracing and the erosion of privacy location tracking private surveillance companies etc. What can we do to protect our digital rights post COVID can we look to any examples in history of what to expect. That is that is a huge question and something I have been thinking about as well. I am I'm going to put that off till later to talk about and that is something where in the course I teach I have an entire unit on privacy and surveillance which I do not in tonight's lecture but I can share some materials although I am already really even just like rethinking how I'm going to teach privacy and surveillance in the age of COVID-19 compared to two months ago when I taught at the first time. But that is something I think about a lot and I will talk about later if we have time or or on the forums if we if we don't that's a great question a very important question. On the topic and I will say and I have not had the time to look into them yet. I do know that there are groups that are working on what are kind of more privacy protecting approaches for tracking and there are also groups putting out like if we are going to use some sort of tracking what are the safeguards that need to be in place to do it responsibly. Yes. I've been looking at that too. It does seem like this is a solvable problem with with technology not all of these problems are but you can certainly store tracking history on somebody's cell phone. And then you could have something where you say when you've been infected and at that point you could tell people that they've been infected by sharing the location and privacy preserving way. I think some people are going to work on that. I'm not sure it's particularly technically difficult problem. So I think there's sometimes there are ways to provide the minimum kind of level you know kind of application with with whilst keeping privacy. Yeah. And then I think it is very important to also have things of you know clear like expiration date. Like we you know like looking back at 9 11 in the United States that kind of ushered in all these laws that we're now kind of stuck with that have really eroded privacy of anything we do around COVID-19 being very clear. We are just doing this for COVID-19 and then there's a time limit and expires and it's kind of for this clear purpose. And there are also issues though of you know I mentioned earlier about data containing errors. I know this is already been an issue in some of other countries that we're doing kind of more surveillance focused approaches of you know what about like when it's wrong and people are getting kind of quarantined and they don't even know why and for no reason. And so to be mindful of those but yeah well we'll talk more about this kind of later on. Back to back to bias. Yeah we had kind of the benchmarks so when when the benchmark that's you know widely used has bias and then that is really kind of replicated at scale and we're seeing this with ImageNet as well which is you know probably the most widely studied computer vision data set out there. Two thirds of the ImageNet images are from the west so this pie chart shows that the 45% of the images in ImageNet are from the United States 7% from Great Britain 6% from Italy 3% from Canada 3% from Australia. You know we're covering a lot of this pie without having gotten to outside the west. And so then this has shown up in concrete ways of classifiers trained on ImageNet so one of the categories is bridegroom, a man getting married. There are a lot of you know cultural components to that and so they have you know much higher error rates on bridegrooms from the Middle East or from the global south. And there are there are people now kind of working to diversify these data sets but it is quite dangerous that they can really be kind of widely built on its scale or have been widely built on its scale before these biases were recognized. Another key study is the compass recidivism algorithm which is used in determining who has to pay bail. So in the US a very large number of people are in prison who have not even had a trial yet just because they are too poor to afford bail as well as sentencing decisions and parole decisions. And ProPublica did a famous investigation in 2016 that I imagine many of you have heard of in which they found that the false positive rate for black defendants was nearly twice as high as for white defendants. So black defendants who were live a study from Dartmouth found that it was the the software is no more accurate than Amazon mechanical Turk workers so random people on the internet. It's also the software is you know this proprietary black box using over 130 inputs and it's no more accurate than a linear classifier on three variables. Yet it's still in use and it's in use in many states Wisconsin is one place where it was challenged yet the Wisconsin Supreme Court upheld its use. If you're interested in the kind of topic of how you define fairness because there is a lot of intricacy here and I mean I don't know anybody working on this who thinks that what compass is doing is is right but they're they're using this different different definition of fairness. Arvin Ranyan has a fantastic tutorial 21 fairness definitions and their politics that I that I highly recommend. And so going back to kind of this taxonomy of types of bias. This is an example of historical bias and historical bias is a fundamental structural issue with the first step of the data generation process and it can exist even given perfect sampling and feature selection. So kind of with the image classifier that was something where we could you know go gather a more representative set of images and that would help address it. That is not the case here. So gathering kind of more data on the US criminal justice system. It's all going to be biased because that's really kind of baked into baked into our history and our current state. And so this is I think good good to recognize one thing that can be done to try to at least mitigate this is to to really talk to domain experts and by the people impacted and so a really positive example of this is a tutorial from the Fairness Accountability and Transparency Conference that Christian Lum who's the lead statistician for the human rights data analysis group and now professor at U Penn organized together with a former public defender Elizabeth Bender who's the staff attorney for New York's legal aid society and Terrence Wilkerson an innocent man who was arrested and cannot afford bail and Elizabeth and Terrence were able to provide a lot of insight to how the criminal justice system works in practice which is often kind of very different from the you know more kind of clean logical abstractions that that computer scientists deal with but it's really important to understand those kind of intricacies of how this is going to be implemented and used in these you know messy complicated real world systems. Question. Aren't the AI biases transferred from real life biases. For instance, how are people being treated differently isn't every day phenomenon women too. That's correct. Yes. So this is often yeah coming from from real world biases and I'll come to this in a moment but algorithmic systems can amplify those biases so they can make them even worse but yeah they are often being learned from from existing data. I asked it because I guess I often see this being raised as it's kind of a reason not to worry about AI. Well I'm going to get to that in a moment actually thinking two slides so hold on to that question. I just want to talk about one other type of bias first measurement bias. And so this was an interesting paper by us and deal Malanathan and I had Obermeyer where they looked at historic electronic health record data to try to determine what factors are most predictive of stroke and they said you know this could be useful like prioritizing patients at the ER. And so they found that the number one most predictive factor was prior stroke which that makes sense. Second was cardiovascular disease that's also that seems reasonable. And then third most kind of still very predictive factor was accidental injury. Followed by having a benign breast lump colonoscopy or sinusitis. And so I'm not a medical doctor but I can tell something weird is going on with factors three through six here like why would these things be predictive of stroke. Does anyone want to think about about why this might be. I mean guess if you want to read. Oh someone's yeah. Okay the first answer was they test for it anytime someone has stroke. Confirmation bias overfitting is because they happen to be in hospital already. Biased data EHR records these events because the data was taken before certain advances in medical science. These are these are all good guesses not not quite what I was looking for but good good thinking. That's such a nice way of saying. So what the researchers say here is that this was about their patients. They're people that utilize health care a lot and people that don't and they call it kind of high utility versus low utility of health care. And there are a lot of factors that go into this I'm sure just who has health insurance and who can afford their co pays there may be cultural factors may be racial and gender by there is racial and gender bias on how people are treated. So a lot of factors and basically people that utilize health care a lot they will go to a doctor when they have sinusitis and they will also go in when they're having a stroke and people that do not utilize health care much are probably not going to go in possibly for either. And so what the authors write is that we haven't measured stroke which is you know a region of the brain being denied a kind of new blood and new oxygen. What we've measured is who had symptoms who went to the doctor received test and then got this diagnosis of stroke and you know that seems like it might be a reasonable proxy for for who had a stroke. But a proxy is you know never exactly what you wanted and in many cases that that gap ends up being significant and so this is just one form that that measurement bias can take but I think it's something to really kind of be on the lookout for because it can be quite subtle. And so now starting to return to a point that was brought up earlier aren't aren't people biased. Yes, yes we are. And so there have been dozens and dozens if not hundreds of studies on this but I'm just going to quote a few all of which are linked to in this. Cindy L. Malanathan New York Times article if you want to find find the studies so this all comes from you know peer reviewed research. But when doctors were shown identical files they were much less likely to recommend a helpful cardiac procedure to black patients compared to white patients and so that was you know same file but just changing the race of the patient. When bargaining for a used car black people were offered initial prices $700 higher and received fewer concessions responding to apartment rental ads on Craigslist with a black name elicited fewer responses and with a white name. And all white jury was 16 points more likely to convict a black defendant than a white one but when a jury had just one black member it convicted both at the same rate. And so I share these to show that kind of no matter what type of data you're looking working on whether that is medical data or sales data or housing data or criminal justice data that it's very likely that there's there's bias in it. There's a question. No I was just going to say I find that last one really interesting like this kind of idea that a single black member of a jury. I guess it has some kind of like anchoring impact to kind of suggest that you know I'm sure you're going to talk about diversity later but I just want to keep this in mind that maybe like even a tiny bit of diversity here just reminds people that there's a you know a range of different types of people and perspectives. No that's a that's a great point yeah. And so the question that was kind of asked earlier is so why does algorithmic bias matter like I have just shown you that humans are really biased to so why are why are we talking about algorithmic bias and people have brought this up kind of like what's what's the fuss about it. And there I think algorithmic bias is a very significant worth talking about and I'm going to share four reasons for that. One is the machine learning can amplify bias so it's not just encoding existing biases but in some cases it's making them worse and there have been a few studies on this. One I like is from Maria de Artiga of CMU and here they were they took people's I think job descriptions from LinkedIn and what they found is that imbalances ended up being compounded and so in the group of surgeons. Only 14% were women however in the true positives so they were trying to predict the job title from the summary. Women were only 11% in the true positives so this kind of imbalance has gotten worse and basically there's kind of this asymmetry where the you know the algorithm has learned it's safer for for women to kind of not not guess surgeon. Another so this is one reason another reason that. Algorithmic bias is a concern is that algorithms are used very differently than human decision makers in practice and so people sometimes talk about them as though they are plug and play interchangeable of you know with humans this bias and the algorithm is you know this bias why don't we just substitute it in. However the whole system around it ends up kind of being different in practice. One one kind of aspect of this is people are more likely to assume algorithms are objective or air free even if they're given the option of a human override and so if you give a person you know even if you just say I'm just giving the judge this recommendation they don't have to follow it and it's coming from a computer many people are going to take that as objective. In some cases also there may be you know pressure from their boss to you know not disagree with the computer more times you know nobody's going to get fired by going with the computer recommendation. Algorithms are more likely to be implemented with no appeals process in place and so we saw that earlier when we were talking about recourse. Algorithms are often used at scale they can be replicating an identical bias at scale. And algorithmic systems are cheap and all of these I think are interconnected so in many cases. I think that algorithmic systems are being implemented not because they produce better outcomes for everyone but he has. They're kind of a cheaper way to do things at scale you know offering a recourse process is more expensive being on the lookout for errors is more expensive so this is kind of cost cutting measures and Kathy O'Neill talks about many of these themes in her book weapons of math destruction. Kind of under the idea that the privilege to process by people the poor process by algorithms there's a question. Any questions. This seems like an intensely deep topic meeting specialized expertise to avoid getting it wrong. If you were building an ML product would you approach an academic institution patient on this. Do you see a data product development triad becoming maybe a quartet involving an ethics or data privacy. So I think interdisciplinary work is very important. I would. I would definitely focus on trying to find kind of domain experts on whatever your particular domain is who understand the intricacies of that domain is important. And I think with the kind of with the academic it depends you do want to make sure you get someone who's kind of applied enough to kind of understand how how things are happening. In industry but I think involving more people and people from more fields is is a good a good approach on the whole. Someone invents and publishes a better ML technique like attention or transformers and then next to graduate student demonstrates using it to improve facial recognition by five percent. And then a small startup publishes an app that does better facial recognition and then a government uses the app to study downtown walking patterns and endangered species. And after these successes for court audit monitoring and then a repressive government then takes that method to identify ethnicities and then you get a genocide. No one's made a huge ethical error at any incremental step yet the result is horrific. I have no doubt that Amazon will soon serve up a personally customized price for each item that maximizes their profits. How can such ethical creep be addressed where the effect is remote for many small causes. So yeah so that that's a kind of a great summary of how these things can happen somewhat incrementally. I'll talk about some tools to implement it kind of towards the end of this lesson that hopefully can help us. So some of it is I think we do need to get better at kind of trying to think a few more steps ahead than we have been. You know in particular we've seen examples of people you know there was the study of how to identify protesters in a crowd even when they had scarves or sunglasses or hats. On you know and when the researchers on that were questioned they were like oh it never even occurred to us that bad guys would use this you know we just thought it would be for finding bad people. And so I do think kind of everyone should be building their ability to think a few more steps ahead and part of this is like it's great to do this in teams preferably in diverse teams can help with that that process. I mean on this question of computer vision there has been you know just in the last few months. Is it Joe Redmond creator of YOLO who has said that he's no longer working on computer vision just because he thinks that the misuses so far outweigh the the positives and Tim net Gebru said she's she's considering that as well. So I think there are there are times where you have to consider. And then I think also really actively thinking about how to what safeguards do we need to put in place to kind of address the the misuses that are happening. Yes I just wanted to say somebody really liked the Kathy O'Neill quote privileged processed by people the poor processed by algorithms. And they're looking forward to learning more reading more from Kathy O'Neill said a book that you recommend. Yes yeah and Kathy O'Neill also writes and Kathy O'Neill is a fellow fellow math PhD. But she also has written a number of good articles and the book kind of goes through a number of case studies of how algorithms are being used in different places. So kind of in summary of humans are biased. Why do why do we making a fuss about algorithmic bias. So one is we saw earlier machine learning can create feedback loops so it's you know it's not just kind of observing what's happening in the world. But it's also determining outcomes and it's kind of determining what future data is machine learning can amplify bias algorithms and humans are used very differently in practice. And then I'll also say technology is power and with that comes responsibility and I think for for all of us to have access to deep learning we're still in a kind of very fortunate and small percentage of the world. So that is able to use this technology right now and I hope I hope we will all use it responsibly and really take our power seriously. And I just I just noticed the time and I think we're about to start next section on analyzing or kind of steps steps we can take so this would be a good a good place to take a break so let's meet back in seven minutes at 745. Let's start back up. And actually I was at a slightly different place than I thought. But just a few questions that that you can ask about projects you're working on and I hope you will ask about projects you're working on. The first is should we should we even be doing this and considering that maybe there's some work that we shouldn't do. There's a paper when the implication is not to design technology as engineers we often tend to respond to problems with you know what can I make or build to address this but sometimes the answer is to not make or build anything. One example of research that I think has huge amount of downside and really no upside I see was kind of to identify the ethnicity particularly for people of ethnic minorities. And so there was work done identifying the Chinese Uighurs which is the Muslim minority in Western China which is since you know over a million people have been placed in internment camps. And so I think this is a very very harmful harmful line of research. I think that the you know there've been at least two attempts of building building a classifier to try to identify someone's sexuality which is it's probably just picking up on kind of stylistic differences but this is something that could also be quite quite dangerous as in many countries it's it's illegal to be gay. Yes. So this is a question for me which I don't know the answer to. Yeah. As that title says a standard scientist says he built the gate art using the lamest AI possible to prove a point in my understanding is that point was to say. You know I guess it's something like hey you could use fast AI lesson one. After an hour or two you can build this thing anybody can do it. You know how do you feel about this idea that there's a role to demonstrate what's readily available with the technology we have. Yeah I mean that's the thing that I think. So I appreciate that and I'll talk about this a little bit later open AI with GPT to I think was trying to raise a raise a debate around around dual use and what is responsible release of of dual use technology. And what's kind of responsible way to raise raise awareness of what is possible in the in the cases of researchers that have done this on the sexuality question. To me it hasn't seemed like they've put adequate thought into how they're conducting that and who they're collaborating with to ensure that it is something that is leading to kind of helping address the problem. But I think you're right that I think there is probably some place for letting people know what is probably widely available now. Yeah it reminds me a bit of pen testing in info set where where it's kind of considered it. Well there's an ethical way that you can go about pointing out that it's trivially easy to break into somebody's system. Yes yeah I would agree with that that there there is an ethical way but I think that's something that we as a community still have more work to do and even determining what that is. Other questions to consider are what bias is in the data and something I should highlight is people often ask me how can I do bias my data or ensure that it's bias free. And that's not possible all data contains bias and the kind of most most important thing is just to understand kind of how your data set was created and what its limitations are so that you're not blindsided by that bias but you're never going to fully remove it. And some of the I think most promising approaches in this area are work like Timnit Gebruz data sheets for data sets which is kind of going through and asking kind of a bunch of questions about how your data set was created and for what purposes and how it's being maintained. And you know what are the risks in that just to really kind of be aware of the context of your data. Can the code and data be audited. I think particularly in the United States we have a lot of issues with when private companies are creating software that's really impacting people through the criminal justice system or hiring. And when these things are you know kind of their proprietary black boxes that are protected in court. This creates a lot of kind of issues of you know what are what are our rights around that. Looking at error rates for different subgroups is really important and that's what's kind of so powerful about Joy Bell and Winnie's work if she had just looked at light skin versus dark skin and men versus women she wouldn't have identified just how poorly the algorithms were doing on dark skin and women. What is the accuracy of a simple rule based alternative. And this is something I think Jeremy talked about last week which is kind of good good machine learning practice to have a baseline. But particularly in cases like the compass recidivism where this 130 variable black box is not doing much better than a linear classifier on three variables that raises kind of a question of why are why are we using this. And then what processes are in place to handle appeals or mistakes because there will be errors in the data there may be bugs in the implementation and we need to have a process for recourse. Yes. Can you explain this is for me now. Sorry I'm asking my own questions. Nobody voted them up at all. What's the thinking behind this idea that a simpler model is you're going to say a simpler model or other things being the same you should pick the simple one. Is that what this baseline is for what. And if so what's the kind of thinking behind that. Well I guess with the compass recidivism algorithm. Some of this for me is linked to the proprietary black box nature and so you're right if maybe if we had a way to introspect and what were our rights around appealing something. But I would say yeah like why use the more complex thing if the simpler one works the same. And then how diverse is the team that built it and I'll talk more about team diversity later later in this lesson says Jeremy at the start but I'm not the teacher so it actually says Jeremy. Do you think transfer learning makes this tougher auditing the data that led to the initial model. I assume they mean Jeremy please ask Rachel. No they were they were asking you. That's no that's a good question. Again I think it's important I would I would say I think it's important to have information probably on both data sets what the initial data set used was and what the data set you use to fine tune it. Do you have thoughts on that. What she said. And then I'll say so wall bias and fairness as well as accountability and transparency are important. They aren't everything. And so there's this great paper a mulching proposal by us keys at all. And here they talk about a system for turning the elderly into high nutrient slurry so this is something that's clearly unethical but they propose a way to do it that is fair and accountable and transparent and. Meets these qualifications and so that kind of shows some of the limitations of this framework as well as kind of being a good a good technique for. Kind of inspecting whatever framework you are using of trying to find something that's clearly unethical that could that could meet meet the standards you've put forth. That that technique I really like it it's like a it's my favorite technique from philosophy it's this idea that you say OK given this premise. Here's what it implies and then you try and find an implied result which intuitively is clearly. It's a really it's it's yeah it's the number one philosophical thinking tool I got out of university and sometimes you can have a lot of fun with it like this time too. Thank you. All right so the next kind of big case study or topic I want to discuss is disinformation. So in 2016 in Houston a group called Heart of Texas posted about a protest outside an Islamic center and they told people to come armed. Another Facebook group posted about a counter protest to show up supporting freedom of religion and inclusivity. And so there were kind of a lot of people present at this and more people on the side supporting freedom of religion and a reporter though for the Houston Chronicle notice something odd which he was not able to get in touch with the organizers for either side. And it came out many months later that both sides had been organized by Russian trolls. And so this is something where you had the people protesting were you know genuine Americans kind of protesting their beliefs but they were doing it in this way that had been kind of completely framed very disingenuously by by Russian operatives. And so when thinking about disinformation it is not people often think about so called fake news you know and inspecting like a single post is this you know is this true or false but really disinformation is often about orchestrated campaigns of manipulation and that it involves kind of all the seeds of truth you know kind of the best propaganda always involves kernels of truth at least it also involves kind of misleading contacts and can and can involve very kind of sincere sincere people that get swept up in it. A report came out this fall an investigation from Stanford's Internet Observatory where Renee de Resta and Alex Stamos work of Russia's kind of most recent disinformation or most recently identified disinformation campaign and it was operating in six different countries in Africa. It often purported to be local news sources. It was multi platform they were encouraging people to join their WhatsApp and telegram groups, and they were hiring local people as reporters and a lot a lot of the content was not not necessarily disinformation it was stuff on culture and sports and local weather. I mean there was a lot of kind of very pro pro Russia coverage, but that it covered a range of topics and since this is kind of a very sophisticated phase of disinformation and in many cases it was hiring hiring locals kind of as reporters to work for these sites. And I should say well I've just given two examples of Russia Russia does certainly does not have a monopoly on disinformation there are plenty of plenty of people involved in producing it. Kind of on a topical topical issue there's been a lot of disinformation around around coronavirus and COVID-19. I, in terms of kind of a personal level if you're looking for advice on spotting disinformation or to share with loved ones about this might call field is a great person to follow and he's even. So he tweets at Holden and then he has started an infodemic blog specifically about the about COVID-19. But he talks about his approach and how people have been trained in schools for 12 years here's a text read it user critical thinking skills to figure out what you think about it, but professional fact checkers do the opposite they get to a page and they immediately get off of it and look for kind of higher higher quality sources to see if they can find confirmation. Call field also really promotes the idea of a lot of critical thinking techniques that have been taught take a long time and you know we're not going to spend 30 minutes evaluating each tweet that we see in our Twitter stream. It's better to give people an approach that they can do in 30 seconds that you know it's not going to be fail proof if you're just doing something for 30 seconds but it's better to to check then to have something that takes 30 minutes that you're just not going to do at all. So I wanted to kind of put this out there as a resource and he has a whole kind of set of lessons at lessons.checkplease.cc and he's a he's a professor. And I in the data ethics course I'm teaching right now. I made my first lesson the first half of which is kind of specifically about coronavirus disinformation made that available on YouTube I've already shared it and so I'll add a link on the forums. If you want if you want a lot more detail on disinformation than just kind of this the short bit here. But so going back to kind of like what is disinformation. It's important to think of it as an ecosystem again it's not just a single poster single news story that has you know is misleading or has false elements in it but it's this really this broader ecosystem. Claire Wardle first draft news who is a leading expert on this and does a lot around kind of training journalists and how journalists can report responsibly talks about the trumpet of amplification and this is where I rumors or memes or things can start on 4chan and 8chan and then move to closed messaging groups such as WhatsApp Telegram Facebook Messenger from there to community conspiracy communities on Reddit or YouTube then to kind of more mainstream social media and then picked up by the professional media and politicians and so this can make it very hard to address that it is this kind of multi platform in many cases campaigns may be utilizing kind of the differing rules or loopholes between between the different platforms and I think we've certainly are seeing more and more examples where it doesn't have to go through all these steps but can can can jump jump forward. And online discussion is very very significant because people it helps us form our opinions and then this is tough because I think most of us think of ourselves as pretty independent minded but discussion really does you know we have all this kind of social things and to be influenced by by people in our in group and in opposition to people in our out group and so online discussion impacts us. People discuss all sorts of things online. Here's a reddit discussion about whether the US should cut defense spending and you have comments you're wrong the defense budget is a good example of how badly the US spends money on the military. Someone else says yeah but that's already happening there's a huge increase in the military budget the Pentagon budgets already increasing. I didn't mean to stop sound like stop paying for the military. I'm not saying that we cannot pay the bills but I think it would make sense to cut defense spending. Does anyone want to guess what subreddit this is from unpopular opinion news change my view net neutrality. Those are good guesses but they're wrong. I love the way you say no. This is all from what is it's from the sub simulator GPT to so these comments are all written by GPT to and this is in good fun it was clearly labeled on the subreddit that it's coming in GPT to is a language model from open AI that was kind of in a trajectory of research that many many groups were on. And so it was released I guess about a year ago and should I read the unicorn story Jeremy. Okay so if many of you have probably have probably seen this so here and then this this was cherry picked but this is still very very impressive. So a human written prompt was given to the language model in a shocking finding scientists discovered a herd of unicorns living in a remote previously unexplored valley in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorn spoke perfect English and then the next part is all generated by the language model so this is a deep learning model that produced this and the computer model generated by Dr. Jorge Perez would appear to be a natural fountain surrounded by two peaks of rock and silver snow present the others then ventured further into the valley by the time we reached the top of one peak the water looked blue with some crystals on tops at Perez. Perez and his friends were astonished to see the unicorn herd. These creatures could be seen from the air without having to move too much to see them. They were so close they could touch their horns. While examining these bizarre creatures the scientists discovered that the creatures also spoke some fairly regular English Perez stated we can see for example that they have a common language something like a dialect or dialectic. And so I think this is really compelling pros to have been have been generated by a computer in this form. So we've also seen advances in computers generating pictures specifically Gans. So Katie Jones was listed on LinkedIn as a Russia and Eurasia fellow. She was connected to several people from mainstream Washington think tanks and the Associated Press discovered that she is not a real person. This photo was generated by a GAN. And so this I think it's kind of scary when we start thinking about how how compelling the text that's being generated is and combining that with pictures these photos are all from this person does not exist. Com generated by Gans and there's a very very real and imminent risk that online discussion will be swamped with fake manipulative agents to a even greater extent than it than it already has and that this this can be used to influence public opinion. So actually I guess well I'll keep going going back in time to 2017 the FCC was considering repealing net neutrality. And so they opened up for comments to see you know how do Americans feel about net neutrality. And this is a sample of many of the comments that were opposed to net neutrality they wanted to repeal it and included I'll just read a few clips. Americans as opposed to Washington bureaucrats deserve to enjoy the services they desire. Individual citizens as opposed to Washington bureaucrats should be able to select whichever services they desire people like me as opposed to so called experts should be free to buy whatever products they choose. And these have been helpfully color coded so you can kind of see a pattern that this was a bit of a mad libs where you had a few choices for green for the first noun. And then in orange or red I guess it's as opposed to or rather than orange we've got either Washington bureaucrats so called experts the FCC and so on. And this this analysis was done by Jeff cow who's now a computational journalist at Pro Publica doing great work. And he did this analysis discovering this this campaign in which these comments were designed to look unique but had been created kind of through through some mail merge style kind of putting together a mad libs. Yes. So this was this was great work by Jeff. He found that so while they received the FCC received over 22 million comments less than 4% of them were truly unique. And this is this is not all malicious activity. You know there are many kind of ways where you get a template to contact your legislator about something. But you know an example kind of shown previously these were designed to look like they were unique when they weren't and more than 99% of the truly unique comments wanted to keep net neutrality. However that was not not the case if you looked at the full the full 22 million comments. However this was this was in 2017 which may not sound that long ago. But in in the field of natural language processing we've had like an entire kind of a revolution since then there's just been so much progress made. And this would be I think virtually impossible to catch today using a you know if someone was using a sophisticated language model to to generate comments. So Jess asks a question which I'm going to treat it as a two part question if it's not necessarily what happens when there's so much AI trolling that most of what gets straight from the web is AI generated text. And then the second part and then what happens when you use that to generate more AI generated text. Yeah so for the first part. Yeah this is a real risk or not risk but kind of challenge we're facing of real humans can get drowned out when so much text is going to is going to be AI trolling. And and we're already seeing and I in the interest of time I have I can talk about disinformation for hours and I had to cut a lot of stuff out. But many people have talked about how kind of the the new form of censorship is about drowning people out. And so it's not necessarily kind of forbidding someone from saying something but just totally totally just drowning them out with the massive quantity of of text and information and comments and AI can really facilitate that and so I do not have a good solution to that. In terms of AI learning from AI text I mean I think you're going to get systems that are potentially kind of less and less relevant to humans and may have harmful effects if they're kind of being used to create software that is interacting with or impacting humans. So that's a concern. I mean one of the things I find fascinating about this is we could get to a point where 99.99% of tweets and fast AI forum posts whatever auto generated particularly on kind of more like political type places where a lot of it's pretty low content. Yeah, the thing is like if it was actually good it wouldn't even know. So what if I told you that 75% of the people you're talking to on the forum right now are actually bots. Which ones they are. How would you prove whether I'm right or wrong. Yeah and I think this is a real issue on Twitter of you know particularly people you don't know of you know wondering like is this an actual person or a bot I think is a common question people people wonder about and can be hard to tell. But I think it has significance for has a lot of significance for kind of how human government works you know I think there's something about humans being in society and having norms and rules and mechanisms that this can really undermine and make difficult. And so when when GP to came out Jeremy Howard co founder of fast AI was quoted in the verge article on it I've been trying to warn people about this for a while. We have the technology to totally fill Twitter email and the web up with reasonable sounding context appropriate pros, which would drown out all other speech and be impossible to filter. So one kind of step towards addressing this is the need for digital signatures or an etziani. The head of the Allen Institute on AI wrote about this in HBR. He wrote recent developments in AI point to an age where forgery of documents pictures audio recordings videos and online identities will occur with unprecedented ease. AI is poised to make high fidelity forgery inexpensive and automated, leading to potentially disastrous consequences for democracy, security and society and proposes kind of digital signatures as a means for authentication. And I will say here, kind of one of the one of the additional risk of kind of all this forgery and fakes is that it also undermines people speaking the truth and Zane up to fact she who does a lot of research on protests around the world and in different social movements has said that she's often approached by kind of whistleblowers and dissidents who in many cases will risk their lives to try to publicize like a wrongdoing or human rights violation only to have kind of bad actors say, oh, that picture was photoshopped that was faked and that it's kind of now this big issue for for whistleblowers and dissidents of how, how can they verify what they are saying and that kind of that need, need for verification. And then someone you should definitely be following on this topic is Renee DiResta. And, and she read a great article with Mike Godwin last year, framing that we really need to think disinformation as, as a cyber security problem, you know, as he's kind of coordinated campaigns of manipulation and bad actors and there's I think some important work happening at Stanford as well on this. All right, questions on disinformation. Okay, so next step, ethical foundations. So now, so the fast AI approach we always like to kind of ground everything and what are the real world case studies before we get to kind of the theory underpinning it and I'm not going to go go too deep on this at all. So there is a fun article what would an Avenger do and hat tip to Casey Fiesler for suggesting this. And it goes through kind of three common ethical philosophies utilitarianism and gives the example of Iron Man. I'm trying to maximize good. Deontological ethics of Captain America being an example of this adhering to the right and then virtue ethics Thor living by a code of honor. And so I thought that was a nice reading. Yes. Where do you stand on the argument that social media companies are just neutral platforms and that problematic content is the entire responsibility of the users just the same way that phone companies aren't held responsible when phones are used for scams or car companies held responsible when vehicles are used for say terrorist attacks. And so I do not think that the platforms that the platforms are neutral because they make a number of design decisions and enforcement decisions around even kind of what their what their terms of service are and how those are enforced. And that in keeping in mind harassment can drive many people off of platforms and so kind of many of those decisions is not that everybody gets to keep free speech when there's no enforcement it's just changing kind of who who who is silenced. I do think that there are a lot of really difficult questions that are raised about this because I also I think that the platforms. You know it's they are they are not publishers but they are in this I think kind of intermediate. Area where they're performing many of the functions that publishers used to perform. So you know like a newspaper was which is curating which which articles are in it which is not what platforms are doing but they are getting closer closer to that. I mean something I come back to is it is it is an uncomfortable amount of power for for private companies to have yeah and so it does raise a lot of difficult decisions but I do not believe that they are they are neutral. So for this part I mentioned the Marcula Center earlier definitely check out their site ethics and technology practice they have a lot of useful useful resources. And I'm going to go through these relatively quickly as just kind of examples so they give some kind of the ontological questions that technologists can ask and so the ontological kind of ethics or where you kind of have various kind of rights or duties that you might want to respect and this can include principles like privacy or autonomy. How might the dignity and autonomy of each stakeholder be impacted by this project. What considerations of trust and of justice are relevant. Is this a project involve any conflicting moral duties to others. In some cases you know they'll be kind of conflict between different different rights or duties you're considering. And so this is this kind of an example and they have more more in the reading of the types of questions you could be asking kind of when evaluating of just even how do you evaluate kind of whether whether a project is ethical. Consequentialist questions who will be directly affected who will be indirectly affected. Will the and consequentialist includes utilitarianism as well as common good. Will the effects in aggregate create more good than harm and what types of good and harm. Are you thinking about all the relevant types of harm and benefit including psychological political environmental moral cognitive emotional institutional cultural also looking at long term long term benefits and harms. And then who experiences them is something where the risk of the harm are going to fall disproportionately on the least powerful who is going to be the ones to accrue the benefits. Have you considered dual use. So these are these are again kind of questions you could use when trying to trying to evaluate a project. And I think in the recommendation of the Marcula Center is that this is a great activity to kind of to be doing as a team and as a group. I was going to say like I can't I can't overstate how useful this tool is like you might think oh it's just a question. But like this is kind of to me this is the this is the big gun tool for how you how you handle this is like if somebody is helping you think about the right set of questions and then you like go through them with a diverse group of people and discuss the questions. I mean that's that this is this is gold like you know go back and reread these and don't just skip over them because take take them to work use them next time you're talking about a project that they're a really great great set of questions to use a great tool in your toolbox. Yeah and go to the the original reading has even kind of more detail and more elaboration on on the questions. And then they kind of give a summary of five potential ethical lenses. The rights approach which option best respects the rights of all who have a stake. The justice approach which option treats people equally or proportionally and so these two are both deontological. The utilitarian approach which option will produce the most good and do the least harm the common good approach which option best serves the community as a whole not just some members. And so here three and four are both consequentialist and then virtue approach which option leads me to act as a sort of person I want to be and that can involve you know particular virtues of you know do you value trustworthiness or truth or courage. And so I mean a great activity if this is something that you're studying or talking about at work with your teammates. The the Marcula Center has a number of case studies that you can talk through and we'll even ask you to kind of evaluate them you know evaluate them through these five lenses and how does that kind of impact your your take on what the what the right thing to do is. It's kind of weird for a programmer a computer programmer or data science in some ways some ways to like think of these as tools like past AI or pandas or whatever but I mean they absolutely are this is like. These like software tools for your brain you know to help you kind of go through a program that might help you debug your thinking. Great thank you. And then as someone brought up earlier so that was a kind of very Western centric intro to ethical philosophy there are other ethical lenses and other cultures and I've been doing some reading particularly on the the Maori worldview. I don't feel confident enough in my understanding that I could represent it but it is very good to be mindful that the other other other ethical lenses out there. And I do very much think that. You know the people being impacted by a technology like their their ethical lenses kind of what matters and that this is is a particular issue when we have so many kind of multinational corporations. And there's an interesting project going on in New Zealand now where the New Zealand government is kind of considering its AI approach and is at least ostensibly kind of wanting to wanting to include the Maori view on that. So that's a that's kind of a little a little bit of theory but now I want to talk about some kind of practices you can implement in the workplace. Again this is from the Marcula Center so this is their ethics toolkit which I particularly like. And I'm just I'm not going to go through all of them I'm just going to tell you a few of my favorites. So tool one is ethical risk sweeping and this I think is similar to the idea of kind of pen testing that Jeremy mentioned earlier from security. We have regularly scheduled ethical risk sweeps. And while no vulnerability vulnerabilities found is generally good news that doesn't mean that it was a wasted effort and you keep doing it. Keep looking for for ethical risk one moment and then assume that you miss some risk in the initial project development. Also you have to set up the incentives properly where you're rewarding team members for spotting new ethical risk. Right. Yeah so I've got some comments here. So my comment here is about the learning rate finder and I'm not going to bother with the exact mathematical definition partly because I'm a terrible mathematician and partly because it doesn't matter. But if you just remember oh sorry that's actually not me I am just reading something that Patty Hendricks has trained a language model of me. Reading the language model of me. That was great. The real me. Well thank you. This is a tool one I would say another kind of example of this I think is like red teaming of you know having a team within your org that's kind of trying to find your vulnerabilities. Tool three. Another one I really like expanding the ethical circle. So whose interest desires skills experiences and values have we just assumed rather than actually consulted who are all the stakeholders who will be directly affected. And have we actually asked them what their interests are who might use this product that we didn't expect to use it or for purposes that we didn't initially intend. And so then a great implementation of this comes from the University of Washington's tech policy lab. Did a project called diverse voices and it's neat they have both a academic paper on it and then they also kind of have like a guy lengthy guide on how you would implement this. But the idea is how to kind of organize expert panels around around new technology. And so they they did a few samples one was they're considering augmented reality and they held expert panels with people with disabilities. People who are formally are currently incarcerated and with women to get their get their input and make sure that that was included. They did a second one on an autonomous vehicle strategy document and organized expert panels with youth with people that don't drive cars and with extremely low income people. And so I think this is a great guide if you're kind of unsure of how do you even go about setting something like this up to expand your circle include include more people and get get perspectives that maybe underrepresented by your own your employees. So I just want to let you know that this resource is out there. Tool six is think about the terrible people. And this can be hard because I think we're often, you know, thinking kind of positively or thinking about people like ourselves who don't have terrible intentions. But really think about who might want to abuse, steal, misinterpret, hack, destroy or weaponize what we build. Who will use it with alarming stupidity or irrationality. What rewards incentives openings has our design inadvertently created for those people and so kind of remembering back to the section on metrics you know how are people going to be trying to to game or manipulate this. And how can how can we then remove those rewards or incentives. And so this is this is an important kind of important step to take. And then tool seven is closing the loop ethical feedback and iteration. Remembering this is never a finished task and identifying feedback channels that will give you kind of reliable data and integrating this process with quality management and user support and developing formal procedures and chains of responsibility for ethical iteration. And this tool reminded me of a blog post by Alex fierce that I really like Alex fierce was previously the chief legal officer at medium. And yeah, I guess this was a year ago, he interviewed something like 15 or 20 people that have worked in trust and safety and trust and safety includes content moderation, although it's not not solely content moderation. And kind of one of the ideas that came up that I really liked was one of one of the people and so many of these people have worked in trust and safety for years at big name companies. And one of them said the separation of product people and trust people worries me because in a world where product managers and engineers and visionaries cared about the stuff it would be baked into how things get built. If things stay this way that product and engineering are Mozart and everyone else is Alfred the butler. The big stuff is not going to change. And so at least two people in this kind of talk about this idea of needing to better integrate trust and safety which are often kind of on the front lines of seeing abuse and misuse of a technology product integrating that more closely with product and Eng so that it can kind of be more directly incorporated and you can have a tighter feedback loop they are about what's going wrong and and how how that can be designed against. Okay, so those were these were well I linked to a few blog posts and research I thought relevant but inspired by the Marcula centers tools for for tech ethics and hopefully those are practices you could think about potentially implementing at your at your company. So next I want to get into diversity, which I know came up earlier. And so only 12% of machine learning researchers are women. This is kind of a very, very dire statistic. There's also kind of extreme lack of racial diversity and age diversity and other factors. And this is this is significant. Kind of positive example of what diversity can help with. In a post Tracy chow who was a early, early engineer at Quora and later at Pinterest wrote that the first feature and so I think she was like one of the first five employees at Quora. The first feature I built when I worked at Quora was the block button. I was eager to work on the feature because I personally felt antagonized and abused on the site. And she goes on to say that if she hadn't been there, you know they might not have added the block button as soon and so that's kind of like a direct example of how how having a diverse team can help. So my kind of key, key advice for anyone wanting to increase diversity is to start at the opposite end of the pipeline from from where people talk about the workplace. I wrote a blog post five years ago if you think women in tech is just a pipeline problem you haven't been paying attention. And this was the most popular thing I had ever written until Jeremy and I wrote the the COVID-19 post last month. Kind of second most most popular thing I've written. But I linked to a ton of ton of research in there. A key statistic to understand is that 41% of women working in tech end up leaving the field compared to 17% of men. And so this is something that recruiting more girls into into coding or tech is not going to address this problem if they keep leaving at very high rates. I just had a little peek at the YouTube chat and I see people are asking questions there. I just wanted to remind people that we are not that Rachel and I do not look at that if you want to ask ask questions you should use the forum thread. And if you see questions that you like then please vote them up such as this one. How about an ethical issue bounty program just like the bug bounty programs that some companies have? No, I think that's a neat idea. Yeah, rewarding people for finding ethical issues. And so the reason that women are more likely to leave tech is and this was found in a meta analysis of over 200 books, white papers, articles. Women leave the tech industry because they're treated unfairly underpaid less likely to be fast track than their male colleagues and unable to advance. And too often diversity efforts end up just focusing on white women, which is wrong. Interviews with 60 women of color who work in STEM research found that 100% had experienced discrimination and their particular stereotypes varied by race. And so I would say it's very important to focus on women of color in diversity efforts as a kind of the top priority. A study found that men's voices are perceived as more persuasive fact based and logical than women's voices even when reading identical scripts. Researchers found that women receive more vague feedback and personality criticism and performance evaluations, whereas men are more likely to receive actionable advice tied to concrete business outcomes. When women receive mentorship is often advice on how they should change and gain more self knowledge. When men receive mentorship, it's public endorsement of their authority. Only one of these has been statistically linked to getting promoted. And all these studies are linked to another post I wrote called the real reason women quit tech and how to address it. Is that a question Jeremy? Yeah, so if you're interested kind of these two blog posts I linked to a ton of relevant research on this. And I think this is kind of the workplace is the place to start in addressing these things. So another issue is tech interviews are terrible for everyone. So now kind of working one step back from people that are already in your workplace but thinking about the interview process. And I wrote a post on how to make tech interviews a little less awful and went through a ton of research and I will say that the interview problem I think is a hard one. I think it's very time consuming and hard to interview people well. But kind of the two most interesting pieces of research I came across. One was from triple bite, which is a recruiting company that interviews kind of does this first round technical interview for people. And then they interview at Y Combinator. It's a Y Combinator company and then they interview at Y Combinator companies. And so they have this very interesting data set where they've kind of given everybody the same technical interview and then they can see which companies people got offers from when they were interviewing at many of the same companies. And the number one finding from their research is that the types of programmers that each company looks for often have little to do with what the company needs or does. Rather they reflect company culture in the backgrounds of the founders. I mean this is something where they even they even gave the advice of if your job hunting look for try to look for companies where the founders have a similar background to you. And that's something that while I am that makes sense that's going to be much easier for certain people to do than others in particular given the gender and racial disparities in VC funding that's going to make a big difference. Yes. Actually I would say that was the most common advice I heard from VCs when I became a founder in the Bay Area. Was when recruiting focus on getting people from your network and people that are as like minded and similar as possible that was by far the most common advice that I heard. Yeah I mean this is maybe like one of my controversial opinions. I do feel like ultimately like I get why people hire from their network and I think that long term we all need to be developed. Well particularly white people need to be developing more diverse networks and that's like a you know like 10 year project that's not something you can do right when you're. Hiring but really kind of developing a diverse network of friends and trusted acquaintances kind of over time. But yeah thank you for that perspective to Jeremy. Then then kind of the other study I found really interesting was one where they they gave people resumes and in one case. So one resume had more academic qualifications and then one had more practical experience and then they switched the gender one was a woman one was a man or you know male name a female name. And basically people were more likely to hire the male and then they would use a post hoc justification of oh well I chose him because he had more academic experience or I chose him because he had more practical experience. And that's something I think it's very human to use post hoc justifications but it's a it's a real risk that definitely shows up in hiring. Ultimately AI or any other technology developed or implemented by companies for financial advantage a more profit. Maybe the best way to incentivize ethical behavior is to tie financial or reputation or risk to good behavior. In some ways similar to how companies are now investing in cybersecurity because they don't want to be the next equity facts. Can grassroots campaigns help in better ethical behavior with regards to their use of AI. That's a good question yeah and I think there are a lot of analogies with cybersecurity and I know that for a long time I think it was hard for people to make. Are people had trouble making the case to their bosses of why they should be investing in cybersecurity particularly because cybersecurity is you know something like when it's working well you don't notice it. And so that can be can be hard to build the case. So I think that there there is a place for grassroots campaigns. I'm going to talk more talk about policy in a bit. It can be hard in some of these cases where there are not necessarily meaningful alternatives. So I do think like monopolies can kind of kind of make that harder. That's a good question. All right so next step actually on this slide is the need for policy. And so I'm going to start with a case study of what's what's one thing that gets companies to take action. And so as I mentioned earlier investigator for the UN found that Facebook played a determining role in the Rohingya genocide. I think the best article I've read on this was by Timothy McLaughlin who did a super super in depth dive on Facebook's role in Myanmar. And people people warned Facebook executives in 2013 and in 2014 and in 2015 how the platform was being used to spread hate speech and to incite violence. One person in 2015 even told Facebook executives that Facebook could play the same role in Myanmar that the radio broadcast played during the Rwandan genocide. And radio broadcast played a very terrible and kind of pivotal role in the Rwandan genocide. Somebody close to it said that's not 2020 hindsight the scale of this problem was significant and it was already apparent. And despite this in 2015 I believe Facebook only had four contractors who even spoke Burmese the language of me of Myanmar question. That's an interesting one. How do you think about our opportunity to correct biases in artificial systems versus the behaviors we see in humans. For example a sentencing algorithm can be monitored and adjusted. This is a specific biased judge who remains in their role for a long time. I mean well theoretically though you I think I feel a bit hesitant about the it's it it'll be easier to correct bias in algorithms. Because I feel like the you still need people kind of making the decisions to prioritize that like it requires kind of an overhaul of the systems priorities I think. It also starts with the premise that there are people who can't be fired or disciplined or whatever. I guess maybe for some judges that's true but that kind of maybe suggests that judges shouldn't be lifetime appointments. Yeah because even then I think you kind of need the change of heart of the people advocating for the new system which I think can would be necessary in other case kind of in that that's kind of the critical piece of getting the people that are wanting to overhaul the values of a system. So returning to this issue of the Rohingya genocide and this is kind of continuing issue. Yeah this is something that's just kind of really stunning to me that that there were so many warnings and that so many people tried to raise an alarm on this and that so little action was taken. And even this was last year Zuckerberg finally said that Facebook would add or maybe maybe this was actually this was probably two years ago said that Facebook would add. But this is you know after genocide is already happening. Facebook would add dozens of Burmese language content reviewers. So in contrast so we have this this is how Facebook really failed to respond in any any significant way in Myanmar. Germany passed a much stricter law about hate speech net net Gigi DZ and the the potential penalty would be up to like 50 million euros. Facebook hired 1200 people in under a year because they were so worried about this penalty and so and I'm not saying like this is a law we want to replicate. But here I'm just illustrating the difference between being told that you're contributing or playing a determining role in a genocide versus a significant financial penalty. We have seen what the one thing that makes Facebook take action is. And so I think that that is really significant and remembering what the what the power of a of a credible threat of a significant fine is and it has to be a lot more than you know just like a cost of doing business. So I I really believe that we need both policy and ethical behavior within industry. I think that policy is the appropriate tool for addressing negative externalities misaligned economic incentives race to the bottom situations and enforcing accountability. However ethical behavior of individuals and of data scientists and software engineers working in industry is very much necessary as well because the law is not always going to keep up it's not going to cover all the edge cases. We really need the people in industry to be making a kind of ethical ethical decisions as well. And so I believe both are significant and important. And then something to note here is that many many examples of kind of a I ethics issues. And I haven't talked about all of these but there was Amazon's facial recognition the ACLU did a study finding that it incorrectly matched 28 members of Congress to criminal mug shots and this disproportionately included Congress people of color. And there's also this was a terrible article or not the article was good but the story is terrible of a city that's using this IBM dashboard for predictive policing and a city official said oh like whenever you have machine learning it's always 99% accurate which is false and quite concerning. We had the issue in so in 2016 ProPublica discovered that you could place a housing ad on Facebook and say you know like I don't want Latino or black people or I don't want wheelchair users to see this housing ad which seems like a violation of the Fair Housing Act. And so there's this article on Facebook was like we're so sorry and then over a year later it was still going on ProPublica went back and wrote another article about it. There's also this issue of dozens of companies were placing ads on Facebook job ads and saying like we only want young people to see this. There's the Amazon creating the recruiting tool that penalized resumes that had the word women's in that. And so something something to note about these examples and many of the examples we've talked about today is that many of these are about human rights and civil rights. And it's a good article by Dominique Harrison of the Aspen Institute on this. And I kind of agree with a Neil Dash's framing. He wrote there is no technology industry anymore. Tech is being used in every industry. And so I think in particular we need to consider human rights and civil rights such as housing education employment criminal justice voting in medical care and think about what rights we we want to safeguard and I do think policy. Is the appropriate way to do that. And I think I mean it's it's very easy to be discouraged about about regulation. But I think sometimes we overlook the positive or the cases where where it's worked well. And so something I really liked about data sheets for data sets by Timnit Gebru at all is that they go through three case studies of how standardization and regular. Regulation came to different industries. And so it's the electronics industry around circuits and resistors. And so there that's kind of around the standardization of you know what the specs are and what you write down about them. And the pharmaceutical industry and car safety and none of these are perfect. But it's still it was a kind of very illuminating the case studies there. And in particular I got very interested in the car safety one. And there's also a great ninety nine percent invisible episode. This is a design podcast about it. And so some some things I learned is that early cars had sharp metal nods on the knobs on the dashboard that could lodge in people's skulls in a crash. Non collapsible steering columns would frequently impaled drivers. And then even after the collapsible steering column was invented it wasn't actually implemented because there was no economic incentive to do so. But it's the collapsible steering column has they said saved more lives than anything other than the seat belt when it comes to car safety. And there was also this just this widespread belief that cars were dangerous because of the people driving them. And it took it took consumer safety advocates decades to just even change the culture of discussion around this and to start kind of gathering and tracking the data. And to put more of an onus on car companies around safety GM hired a private detective to trail Ralph Nader and try to dig up dirt on him. And so this was really a battle that we kind of I take for granted now. And so kind of shows how much how much it can take to to change change the needle there. And then that kind of a more recent issue is that it wasn't until I believe 2011 that it was required that crash test dummies start representing the average female anatomy. In addition to previously was kind of just a crash test dummies were like men and that in a crash of the same impact women were 40 percent more likely to be injured than men because that's kind of who the cars were being designed for. So I thought I thought all this was very interesting and it can be helpful to kind of remember and remember some of the successes we've had. And another area that's very relevant is environmental protections and kind of looking back and. So Galski has a great article on this but you know just remembering like in the U.S. we have rivers that would catch on fire and London had terrible terrible smog and that these are things that were. You know very would not have been possible to kind of solve as an individual. We really needed kind of coordinated coordinated regulation on. All right. And then on kind of closing note. So I think a lot of the problems I've touched on tonight are really huge huge and difficult problems and they're often kind of very complicated. And I go into more detail in this in the course. So please please check out the course once it's once it's released. I always try to offer some like steps towards solutions but I realize they're not they're not always you know as satisfying as I would like of like this is going to solve it. And that's because these are really really difficult problems. And Julia Anguin a former journalist from ProPublica and now the editor in chief of the markup. Gave a really great interview on privacy last year that I liked and found very encouraging. She said I strongly believe that in order to solve a problem you have to diagnose it and that we're still in the diagnosis phase of this. If you think about the turn of the century and industrialization we had I don't know 30 years of child labor unlimited work hours terrible working conditions. And it took a lot of journalists muck wracking and advocacy to diagnose the problem and have some understanding of what it was and then the activism to get laws changed. I see my role is trying to make as clear as possible what the downsides are and diagnosing them really accurately so that they can be solvable. That's hard work and lots more people need to be doing it. And I find that really encouraging and that I do I do think we should be working towards solutions but I think just at this point even better diagnosing and understanding kind of the complex problems we're facing is valuable work. Couple of people are very keen to see your full course on ethics is that something that they might be able to attend or buy or something. So it will be released for free at some point this summer. And it was there was a paid in person version offered at the data Institute as a certificate kind of similar to how this this course was supposed to be offered in person. The data ethics one was in person and that took place in January and February and then I'm currently teaching a version version for the masters of data science students at USF and I will be releasing the free online version later sometime before July. Thank you. I'll see you next time.