 Welcome to the 10th Data AI and Society seminar, I'd like to start by acknowledging the traditional custodians of country throughout Australia and recognizing their continuing connection to land, waters and sky. I pay my respects to their elders past, present and emerging. So we're so pleased with how this seminar has been going with the incredible scholars we've been able to bring to this time zone. And I hope that the discussions we've started here will continue as we try to find ways to forge democratically legitimate data and AI systems. And on this our 10th seminar, a bit of a landmark moment, I'm thrilled to welcome Radiette Bebe, an inspirational scholar from now, whilst from heading to Berkeley via at the moment the Harvard Society of Fellows and to complete the sort of distinguished roundup of institutions, a PhD in Cornell. Her PhD was awarded the ACM Knowledge, Discovery and Data Mining Award. She's the co-founder of the important organization Black in AI. And her talk is going to be on roles for computing in social justice. So Radiette, if you'd like to share your screen, you'll have about 30 minutes for your talk. And then we'll go on to questions. So thanks so much for being here. Well, 30 minutes for my talk. Okay. All right, we'll do, we'll do 30 minutes then. Let me just make sure that you can... Okay, so I'm going to share my screen and just confirming that it's advancing. Yes. Okay, very good. All right. Thanks for confirming that. Thank you so much for the organizers of the seminar for inviting me. It's a lot of fight, sounds like a lot of my friends have spoken in the seminar actually. So it's going to be, I'm sure, a very tough act to follow, but I'm very happy to be sharing some of this work with you all. So I'll just get started. So in many ways, as you know, in many ways, the word looks very different than it did several years ago, even a year ago, even a couple months ago, at least here in the US, we have millions of people who have been infected by the COVID virus, hundreds of thousands of people who've died because of it. And this figure that you see here is from several weeks ago. And so I'm sure it's like severely outdated, right? This is a very ongoing issue that we have in the US. And in the US also, we're dealing with a lot of police violence and sort of attacks on immigrants has sort of reached a breaking point, right? So we have kind of like a lot of things that are happening at the same time. And they're also kind of intersecting with one another and sort of unsustainable and crushing ways. So if you look at the pandemic, for instance, we know that black Americans and American Indians and Hispanic and Latino people are three to five times as likely to be infected by the virus, to be hospitalized and to die from this virus. And likewise, US immigrants, which constitute a large portion of our essential workforce are left out entirely left out of programs aimed at assisting families in these incredibly desperate times. And so you might ask, what are we going to do about this? And if you, like me, are trained in computer science, and you have techniques in data science and algorithms and optimization and these kinds of constellation of techniques that you are trained in and are excited about, you might want to use those skills to try to alleviate some of these issues that I've mentioned and the many others that predate these particular examples that I've given to you. But of course, here we've seen attempts to use algorithmic techniques really to try to support society really backfire. We now have endless and endless of examples where algorithms that were meant to increase efficiency or remove human bias or whatever it is that we wanted to do actually sort of ossify and exacerbate existing biases or discrimination or inequality that we have. And even algorithms that were designed with vulnerable communities in mind can really also sort of backfire. And this is covered by the work of Virginia U-Banks and automating inequality among many other places that you might know of. And of course, this crowd knows that there are many different pitfalls for using computing for social good. The three categories that I would put these pitfalls in are solutionism, which you've heard. This is a tendency to assume that computational techniques will solve our social problems. There's a sort of tinkering that happens where we take a lot of problematic social political systems as being fixed. We just assume that they're given and we sort of try to optimize around them. And there's a sort of diversion that happens also where we get so fixated on a particular type of problem or computational set of techniques that we get distracted from the root of the problem and what other forms of addressing these problems there might be out there. So all of this kind of gives one pause. It gave me pause when I was a PhD student. At some point I was like, should we just kind of throw out all of our techniques and maybe just become sociologists or some other discipline that maybe is useful. And this was really a genuine question that I was asking myself. It was a genuine question that my colleagues at Cornell, we were having these conversations because we had these other techniques that we like, but we also saw these deep-rooted social problems and we wanted to help make progress towards those. And so what happened was we had several conversations over a period of several years actually that eventually led to this paper that kind of synthesizes a lot of what we think gives some evidence to computing being a constructive ally to bringing about meaningful social changes, not sort of minor ones, but sort of meaningful social changes that we think will move things in the right direction. So this is joint work with the collaborators that you see there, most of whom are still at Cornell and are amazing scholars whose work you should follow. So in this work what we started with is the premise that social change is really the work of many hands, right? So it's not going to be computing that's going to solve everything, but it could play a role in by working in harmony with other activities to kind of make progress towards things that are important. So we had, I'm not going to cover everything in the interest of time, but we had sort of a framework through which we were thinking about this. I'll give you a sort of sampling of some of the roles that we identified computing can play towards this school. One of these is the role of computing as a diagnostic. So this is one that kind of comes very naturally to us, right? It says that there's these social problems out there and maybe we're not going to solve them, but just understanding them and sort of the mechanisms that perpetuate them can itself play an important role. So especially when these social problems are manifesting themselves in technical systems, measuring and precisely characterizing these problems, problems itself can play a very important role. So think, you know, just as an example, think back to some of your favorite works in this area. So one that comes immediately to mind for many people is the work of Professor Latania Sweeney, a computer scientist and political scientist at Harvard, who has this paper from 2013, which, you know, in computer science years is like a century ago, right? She has this paper from 2013 that shows that when you search for names of African American, names commonly associated with African American individuals, you're more likely to see ads that are, let's say, arrests related or sort of defaulting on loans and things like that, right? Sort of things that are associated with kind of a negative social economic standard. You're more likely to see that. You see as an example, just the name of the author itself, Latania Sweeney, right? The author herself is Latania Sweeney. And really, this is not the only work that sort of has this kind of diagnostic flavor. If you think back to other work that you've seen, work on facial recognition technologies, facial analysis technologies, work on bias and word embedding, racial bias in healthcare algorithms, they're all these different examples that you can think of. And many of these are not necessarily introducing a solution per se. They're sort of just measuring and characterizing the extent of a problem in these particular cases as they're manifesting in various technical systems. So just to give you an example from my own work, this is work that I did with colleagues at Microsoft Research and Sloane Brook University a couple years back. What we were doing here was we were really interested in this idea of data inequalities, right? Which is that there's groups of individuals that are not adequately represented in data sets. They might be misrepresented in data sets, right? And that really kind of inhibits our ability to make interventions to support these communities. So if you're thinking about health HIV AIDS, for instance, within the African continent, we know that there's just very little data that we have. So even things like death certificates of individuals tend to contain a lot of really serious errors that make them essentially unusable. And so if you're trying to think about, let's say, what information are people seeking? That's very nuanced information. You're not going to get that data, right? And so we're in this position where we have so little information about what information, what knowledge people have, what knowledge they're lacking, what misinformation might be out there that we can't adequately target educational interventions and sort of information based interventions to effectively combat the burden of this disease. And so what me and my colleagues wanted to do in the setting is to use search data as a way to kind of bridge some of this gap that exists, some of this inequality that exists in accessibility of high quality data on individuals across all 54 nations in Africa. So we specifically focused on people searches related to HIV and AIDS over a period of 18 months. And we were able to extract different topics in which people are interested. So some of these were things that you would expect people asking about symptoms and drugs and breastfeeding with HIV, but then other things that are, we know our heart to survey, right? So people asking about stigma and discrimination. So these are individuals asking things like, can my boss fire me if I'm HIV positive? Also, questions related to natural cures and remedies like, does garlic or black seed oil or whatever you want to put in here, does that cure HIV and AIDS? And does profit XYZ cure HIV and AIDS and things like that, right? And so these are things that are just very well known to be very difficult to survey because it's a stigmatized condition and people can't necessarily freely talk about it. And yet these questions are sort of in our data set. And so you can sort of, here you're seeing the different kind of topics, right? These are sort of the different topics that would emerge, but you can actually dig deeper, look at individual search queries as well that are representative to really get a sense of what's going on. So here you can see that related to natural cures and remedies. They're all sorts of questions that people are asking that range from things that are known, like, you know, does moringa plant cure HIV? It's known that people think this and are asking about this, but also others that are very common in our data set, but not necessarily known like honeybee venom cures age or chlorine dioxide cures HIV, right? So these are search queries that are very common in our data set, but there wasn't necessarily an awareness by various ministries of health that this might be a question that people are asking. And so we've been able to use this to actually sort of support data collection efforts so that people who are designing these surveys can use some of these search data insights to ask the right questions or improved questions when collecting surveys. But what we were really interested in was actually when someone asks a question, what result do they see? What does the search engine kind of give back to them as a response to their question? So if you asked one of the most common searches that we saw, which is does garlic cure HIV? At the time, you can do this on Bing, which is what we use, but also Google, you know, pick your favorite search engine. At the time, what you would see is that the top website that was highlighted is a website called miracle of garlic.com. And it was a website that contained content that I, as a non-health expert, could immediately see contained a lot of ascientific information. So this is what one would see. And it's not even like the top website, it's actually highlighted as a sort of answer. And this happened a scary number of times. There were times when we found answers highlighted as an answer and things like that. So there's a really there's there's a real issue here. But in contrast, if you looked at, let's say, anti-retroviral drugs for HIV, then the webpage looks very different. The top website is something that was rated highly by the, we had RAs that evaluated the contents of websites that had graduate level training in health, and they rated this highly, but also actually the whole webpage looks very different. On the right hand side, you see a Wikipedia page highlighted, you see suggested search queries about, let's say, table of anti-retroviral drug interactions, things like that, right? So these are very different web pages that are being presented to people, depending on the topic with which their searches may be associated. And we were able to sort of work with people who volunteer their time, who have graduate level training in health and medicine to like evaluate the content of these searches, of these web pages. And they, and we were able to find that searches associated with natural cures and remedies rated at about 1.5 out of five, meaning that they had serious problems with relevance, objectivity, and, and sort of accuracy by as measured by a health expert, whereas other searches related to stigma and breastfeeding and drugs rated at at least four out of five, at at least four and up out of five, right? So this kind of shows as one discrepancy that existed, that existed in the search engine itself, leaving to leaving people to walk away with different quality content, even though, you know, it's the same person maybe sitting down asking top questions that are related to different topics. And there's many different reasons why this might be happening, right? So there's a sort of like information ecosystem here that's worth thinking through, right? Someone sits down and the type of search, the type of search query related to HIV and AIDS, the search engine does some sort of processing, and then it maps that search query to web pages and what's returned to the person is a results page that has, you know, the top 10 web pages that someone could click on, right? That's what would happen if you did this on Google or on Bing. And there's many different places where sort of different disparities might creep in or bias might creep in. One of them is just what someone is typing themselves. Okay, there we have less control. So we'll kind of leave that aside. The other place is the search engine itself, right? So one thing we found was that if you're typing for does garlic cure HIV, but you have a typo in your search, or there's something that you did that's like maybe the word ordering was not quite optimal or something, search engines do this sort of back end processing to correct that before they try to map the search query with web pages. And we found that they were doing this kind of back end processing significantly less for searches related to natural cures and remedies versus say stigma and discrimination. So there's a sort of bias that's creeping in in this back end processing by the search engines. And the second place where it was creeping in and this was, I'll admit this was a shock to me was that there's just not enough web pages from high authority websites that you could even present, right? So if you said, I want to know does garlic cure HIV, but I only want to see web pages from like CDC, UNABS, NIH, WHO, then it could show you zero web pages. Maybe there's actually no web pages that it could show you that could be a potential contender for a search query related to sort of natural cures and remedies. And so there's actually like, you know, something between four to seven times as many web pages available on drugs and anti-racial viral drugs versus natural cures, right? So when these web pages don't exist, then, you know, it doesn't matter what the web search what the search engine is doing. If these web pages are not out there, then there's nothing that you could map it to. And so this is an opportunity for high authority websites like the NIH or the WHO or UNABS or the CDC to step in and present, present high quality content that otherwise would be filled in by sort of low quality blogs or sort of lower authority websites. So this was something that we were able to share with folks at the NIH. I won't go into it in the interest of time, but we'll step back again, think about the role of computing as a diagnostic. One thing that we really have to be careful of here is that this seems like an important role that it can play, but it's very easy to assume that just because we know the extent of problems then kind of coming up with a solution is going to be very immediate, right? The diagnosis of a problem is actually equal to treatment. We know this is absolutely not the case. Here is a quote that I really like by Ruha Benjamin that says data and so data and short do not speak for themselves and don't always change hearts and minds and policy. So at least here in the U.S., we know there are situations like problems like mass incarceration or chronic homelessness where we really are very well aware of the extent of the problem, but we don't necessarily have sufficient consensus to address these problems. And so it's very important to recognize that just because we know the extent of a problem, it doesn't mean that the treatment is going to quickly follow because it could absolutely not. But of course, computing has also informed treatment, right? So if you're in the kind of mechanism design space or optimization space as I am, you can immediately think of examples where computational folks have informed allocation of kidneys and seats in public schools and low-income housing resources and many different types of examples, right? This is a sort of treatment where we've played, as a community, we've played a role in kind of shaping how things are allocated for quite some time now. People have won, you know, Nobel prizes over this kind of work. And so in this way, it's important to recognize that actually computing is serving as a sort of formalizer. What that means is that because sort of say algorithms or mechanism design or optimization, because it requires us to be very unambiguous and very concrete about our inputs, about our goals, about our constraints, in a way it can actually sort of shape how social problems are understood, right? And what I mean by this is that a lot of times when we have discussions about social problems, we say things like, you know, the social worker must act in the best interest of the child or housing assistance programs aimed to help the most number of people. And this is something that we can kind of all like nod along and say, yes, this is important, right? But it's very ill-defined. What do we mean by the best interest of the child or even help the most number of people? These are kind of vague things. And actually you could mean any number of things as part of, you know, the most number of people or the best interest of the child or the most qualified. And so in a way, what computing does is that because it forces you to translate these more ambiguous sentences that are very easy to agree to into a very concrete math problem, it actually gives you sort of like an opportune site for contestation, or it can be sort of like natural target for advocacy. And just as an example from sort of my own work, one thing that we've been really interested in working on is problems related to housing assistance programs to alleviate housing instability and eviction. And the U.S., again, where you might hear things like, you know, we want to help the most number of people. But what we were able to find in this work where we focused on the role of income shocks and leading to undesirable outcomes like eviction, we had this model and we were able to ask a set of sort of like interesting optimization questions. And what we were able to find in that work is that even in this very stylized setting where we were asking very sort of like an optimization problem that like kind of stripped away a lot of the complexity and was just really asking sort of a set of very foundational questions. What we were able to find there was that even in that very stylized setting you can result, you can end up in these situations where you're actually doing the opposite thing depending on, you know, one, you know, reasonable kind of setting versus a different one. Let's say if you're thinking about an optimization problem, this optimization problem where your objective function is I'm in some objective, which says I want to minimize the expected number of people that have that experience eviction. So this is a sort of like you're trying to help as many people as possible sort of in expectation versus I'm in max objective, which says, I'm going to take the person who's like at the most risk of eviction and try to help them. So this is sort of raising the floor, whereas the other one is like trying to kind of like help the most number of people in expectation. What we found was that even in this setting where both of these objective functions seem reasonable, you could actually be targeting entirely disjoint sets of people. So one could say, you should focus on this set of individuals, another can say, no, actually, you should focus on these ones. And that's the optimal thing to do, right. And so here, what we were able to show was that what information you're taking in about families really matters, the objective function really matters, the intervention type really matters, and maybe kind of optimally so. And so in that in that sense, what we were able to argue was that a lot of times when we think about assistance programs, at least here in the US, it's very easy for us to say, well, you know, there's a lot of waste. We just don't know who to target. We haven't done it optimally. If only we could just be more efficient about it, then this problem would not be such a big deal, right. But here is a, you know, I'm kind of sweeping a lot of the details under the rug. But here's an example where we actually kind of optimally solve this problem in a lot of different settings. And it tells you to do very different things and sometimes the opposite thing. And so what that says is that maybe the problem here is not efficiency, maybe the problem is that we just aren't investing enough in our systems, in our kind of housing systems and housing stability systems. And so, you know, that kind of goes back to the point that in the US, a lot of people, something like three fourth of people who are who could actually be using a lot of assistance for housing actually just don't get it at all, right. So that's, you know, the vast majority of people are just not getting the assistance that they desperately need. And eviction was a huge problem even before this pandemic. And now we're sort of in this real crisis mode. And so in this way, this optimization question that I told you kind of provides further opportunity for us to say, look, you know, here is a concrete thing that I can show you and it's doing the opposite thing. And so, you know, quit talking about efficiency. Maybe this is about something else, right. So this gives us an actual opportunity for advocacy. And there's other roles for computing. I'm not going to go into those, but I recommend that you look at the paper if you're interested in seeing the other other points that we wanted to make here. But I want to close by highlighting kind of where we go from, from the point where we've written these papers, like the ones that I showed you from my own work to actual kind of change, right. Because I think a lot of times as researchers, we it's very easy for us to assume that once we've written the paper and it's out there and we've given talks on it as I have our work here is done, but it really isn't. I think that that's something that sometimes gets lost in conversations. And in particular, one thing I'll add here is that we really have to embrace, especially as computer science people, we really have to embrace this responsibility that we have to make sure that our research is understood and used in ways that is in mind with what we had in mind. Because a lot of times we hear things like, yeah, well, you know, that's the engineer's problem, or that's a policymaker's problem, or I'm just a researcher, or I'm just an engineer, I'm just a scientist, or we didn't really consider that population or we didn't really consider that particular thing in our data, right. And so there's ways of like really evading responsibility that's been sort of normalized in our community for being very honest. And we've seen the harms of of doing that on on people on vulnerable communities. And so we really have to kind of try to take responsibility from for how some responsibility for our work is understood and used. And the second thing is that it is very, very crucial to build partnerships that are based on mutual trust and and respect with domain experts in affected communities. And I really mean both of these, I used to say just domain experts, but then I think people started to assume that I mean like experts in the traditional sense chained in some other discipline. But I actually think of affected communities as also being experts, right, their lived experiences are a sort of expertise that we can learn from as well. And so I think it's really important that we see that as a sort of very crucial part of our kind of end to end process. And I'll give a shout out here to an initiative Mechanism Design for Social Good that that I co-founded and have been co-organizing that really models some of these things that I've mentioned to you that are very important to my research and also other people's research. And this initiative is is focused on the use of algorithmic optimization mechanism design techniques, but also using those in conjunction with insights from other other disciplines to improve access to opportunity for historically disadvantaged communities. So this is an initiative that I'm co-organizing with folks that you see here. I'll give a shout out to Irene Lowe who is Australian. So one of one of you all and just really a wonderful person to work with and a wonderful scholar. You should check out her work and also really everyone that you see here everyone here is just really been a pleasure to work with. And this this initiative grew out of a reading group. It was an online reading group that we ran the start and fall of 2016. So it's been four years. I was co-organizing it at the time. I started in organizing it at the time with Kira Goldner who was at University of Washington as a graduate student. I was a Cornell as a graduate student. Kira is now a postdoc at Columbia University and this was an online reading group of mostly graduate students. It was junior led and we were just trying to learn from different disciplines in different domains to see where we can be useful in conjunction with other disciplines to improve access to opportunity. And from that you know the group kind of grew steadily. We were able to organize a technical workshop series that summer. So this was summer of 2017 about three and a half years ago. And this picture was taken at the very first MD4SG workshop in 2017. It was like the first time that we were all in the same place. It was interesting to do that after a year of like working together online. But since then it's really expanded. Now we have something like 1700 people on our email lists. There are over 100 institutions and 30 individuals from 100 institutions in 30 different countries that are engaged very actively in our workshop series, tutorial series, online working groups, online cool cam series that we run many, many different activities. So a lot of our activities happen online even before it was a necessity but certainly now they all happen online. So I encourage you to check that out. Just to give just sort of one example, we have sort of domain specific working groups now kind of mirroring the original reading group that we had but really diving in into a particular domain like say housing and really trying to work with a multi-disciplinary group of researchers and practitioners really focused on that domain. And one shout out I want to give is just this while we started an Asia Pacific working group. So this is a working group that's that was started by Matt who's been a fearless solo reader of this free of this working group. It's quite a bit of work. So he's really he's really he's really been incredible in organizing it. It has I think 20 plus people all across I think mostly in Australia but I think you know Asia Pacific more generally and it's doing kind of sending quite similar to what we did in our original working group where it's first just sort of exploring a lot of different domains but the hope is that once there's like a huge presence of people that are interested in this interface then maybe it can kind of break off into different different different domains that that might be more that might be more interesting to people. So I encourage you to check that out it's on the website also if you're interested in joining or just learning more please feel free to reach out to me. And we also have a colloquium series that has hosted all all different types of researchers and practitioners including just highlight two of them Professor Al-Roth who is a Nobel Prize winner for his work on mechanism design for school choice and kidney exchange among other things was one of our speakers Dr. Araba Say who's an expert in the ICTD community and sort of a senior researcher in research ICT Africa was another speaker that we had last spring but we've you know it's a monthly colloquium series so there's many many people here that are doing wonderful work and just to just to kind of give you a sneak preview we have a the technical workshop workshop series that we had has matured into a conference an ECM conference series that will be starting in the fall of 2021 so September 2021 so fall in the US maybe spring I guess elsewhere but this is likely September 2021 maybe August 2021 very likely to be virtual but if magically physical events are possible then it'll be at Columbia University and it's an opportunity to sort of submit work from many different disciplines to to kind of get feedback and be a part of the community and so with that I'll stop I'll I'll say that if you're interested in well if you're interested in my research feel free to reach out to me if you're interested in MD for SG you can contact well you can contact me but also you can contact just the the whole MD for SG group and we'll be able to tell you more about what we do and if you're interested in engaging also engage you in any in any way that sort of makes sense for you and with that I'll stop thank you very much