 Hi, folks. My name is Carrie Anderson. I'm going to start us off today by saying that this is the final Luncheon for the academic year will start back up in September And I just want to thank a few folks that have been really instrumental in making sure these happen for us every week We would not be able to do this without the space on HLS campus without the support of the HLS media services team Without the help of the events and facilities offices specifically Rihanna McCarthy who does all of our event organizing Without the support of restaurant associates the team there Including Andrea and Stephanie have been wonderful this year to us and Really the entire team at BKC so our events team our communications team and our ITS team Alan and our IT team Has been here every week Victoria and Dan who are on our comms team and Ruben who basically manages all of these and puts this together for us Every week after week, so just wanted to quickly give them a round of applause for all of the support throughout the year And I'm gonna turn this over now to Amar Asher There was one more person that we forgot to thank which was Carrie herself, so thank you Carrie Who is totally instrumental in being the Orchestra leader of all these lunches interacting with all the guests and all the people that are helping to put that These wonderful talks together, so thank you to the to all the teams that really do such a wonderful job And thank you to our guests, and I'm excited today to introduce Chris O Wilson who is a professor of computer science at Northeastern I think one of the people doing some of the most exciting work in algorithmic studies and Probably one of the people building out the field of algorithmic studies, so we it's a real treat that we have him today Christo is also the head of the undergraduate bachelor's in computer science for cybersecurity at Northeastern He is involved in so many different things and you're gonna hear just one piece of his work Related to algorithmic auditing today, but there's so much that he's done in terms of personalization Looking at personalization and looking at all the ways that algorithms influence our lives online Chris is also involved in Christian Sandvig's lawsuit with the ACLU That is essentially looking at and examining the CFAA and challenging The provision in that in that law that says that Breaking the terms of service Constitutes a federal crime because as you'll see in today's presentation Doing research and breaking breaking terms of service, but maybe not necessarily in this in sins But perhaps what we can talk about it Is something that's really important to research communities like the Birken Klein Center and like universities? All over the country and all over the world. So it's a really important issue within this own community Christo is part of that Part of that effort and he's doing some really exciting work there And I'm gonna hand it over to him now to talk to us about bias and search engines And I'll two other things. I'll just quickly mention before you start one. We're being webcast. So please just be aware of that To Chris. So definitely welcomes questions And if you are gonna ask a question, please just identify yourself so we can have some context Well, thank you all very very much for being here It's an honor to be speaking at Harvard Where Latanya Sweeney arguably sort of invented this idea of algorithm auditing when she looked at racial discrimination and online advertising So as armor said, I'm a computer scientist. So take everything. I say with a grain of salt Outside of you know nitty-gritty technical details. I have no idea what I'm talking about most of the time So please, you know, feel free to ask me questions Challenge me in any way And at the end I have some interesting sort of discussion prompts to kick us off So today I would like to talk about one particular piece of work We did looking at gender bias in online resume search engines Right, this is the kind of search engine that a recruiter would use to go and find Potential candidates to bring them in for review of reviewing and possibly hiring So before we get to that I'd like to take a few moments to just sort of set the stage We're increasingly finding that the social and sort of structural biases that exist in society are Embedded in the data that's being collected by companies and by governments Right, and then the unfortunate things that that data is then used to build various kinds of systems Right, we can use it to train machine learning algorithms or we can use it for for simpler tasks But what that often ends up doing is it takes these social biases and either reflects them or amplifies them Right, you end up with a machine that reflects a social bias Right, and not necessarily this isn't necessarily intentional on the part of the engineers who built the system It's just a consequence of the fact that the data has a problem and you've used it uncritically in some other context So increasingly we hear about this sort of thing in the news You know academics have been interested in this for you know for a long time now looking at sort of uncritical uses of data But you also see this in the news You know people find examples of systems like predictive policing algorithms, let's say where you have a System with a historical bias issue that ends up in the data And now you have tools being used by police that are reflecting and reinforcing those social biases So the question that I'm interested in is how do we actually measure and understand these kinds of systems? Right, it's one thing to say We think this is bad right and engineers should do more Right, they should clean the data or we should train systems that are fair by design Or maybe we need to pass laws or regulations that prohibit these kinds of uses of data Right, but at the end of the day, we still don't really know what's going on out there in the real world in practice Right algorithms or trade secrets companies aren't typically divulging the secrets of what's going on inside these black boxes Even if you mandated that the source code, let's say be transparent That's not enough right because you also need the data right the data is what's ultimately driving the system as well And if we mandate transparency of all data There's huge privacy implications for that. Yes I'm Cathy Pham, I'm a fellow at the Brooklyn Center. We often hear people talk about the data being the problem It's great to have the algorithm up there as well as a goal for your new share Can you share from your experience and perspective ways that algorithms reproduce and entrench the biases from the data? Yeah, so there's many sort of Examples of this get like a trotted out during like every one of these conversations So one is predictive policing like pro-publica has done incredible work looking at the compass algorithm In Broward County, Florida showing that it was more likely to send African-American defendants to jail then white defendants Latanya Sweeney's work looking at Racial bias in advertising So her work showed if you did a search for an African-American name You were likely to see ads saying this person may have been convicted of a crime right do a background check on them Yeah If you just had good data, then there's nothing wrong with the algorithm is totally fine So can you take it up further in it from and talk about I guess more like a more technical part of why it's the algorithm Yeah, so I Frankly, I kind of hate this argument I mean some of the machine learning people really like to say that the data is the is the only issue right the algorithm itself is sort of neutral and beautiful and I don't really buy that in the sense that Someone is making a decision to implement a system right and you could imagine Using machine learning or not using machine learning right there's a variety of tools in the toolbox and to say that Machine learning itself is inherently neutral and it's purely a data issue ignores kind of the autonomy of the engineer and they design Decisions that go into building these systems This also gets into other issues like how data is being collected and dark patterns Are you soliciting information from people to then use it against them essentially? I mean there's a lot of decision-making that goes into every aspect of this from kind of the upfront user interface to the Back-end processing of data those are all decisions being made by people and implemented in code So to say well, it's just a data issue. It's too short-sighted in my opinion But I'm happy to be challenged on that Hold on one second This So this I'm gonna be focusing on all kinds of jobs low and high level So in order to try to understand what these systems are doing There's kind of this emerging fields that I like to call algorithm auditing right? I consider myself to be an algorithm auditor right and what that means essentially is that I go and find Systems of interest right things where it's high impact where I believe it impacts many people's lives And I go and try to figure out how it works. It's what data is being collected. How is that data being used? right and ultimately What is the impact of these systems on people? Right because that's ultimately what we really care about so I'm a bit of a reverse engineer It's sort of a simple straightforward way to think about this So today in particular I want to talk about hiring websites So you've probably heard of many of these right monster linked in glass door All right These are platforms used by millions of people to find employment, right? So you may have used one of these platforms Probably from the perspective of a job seeker, right? You went and created a profile and uploaded a resume and then you could search for open jobs and apply to them Right, but there's another facet to these these tools, right? They're also used by recruiters to do active recruiting Right, they can search the database of all resumes to try and identify potential employees to go reach out to them Right say this is like head-hunting, you know, I think you'd be great for this job come in for an interview All right So I would argue these systems are systemically important like millions of people rely on this as a way to find employment All right, and this is a context where we know there's social and historical biases that regulation has tried to Deal with over the years right so how these systems play into that I think is very interesting and consequential So to give you an idea of what the system looks like from a recruiters perspective, right? It really just looks like a search engine. So here's an example from indeed All right, I'm searching for a software engineer located in the New York region In the left-hand rail you have your filters, right? How far away? Specific skills, you know years of experience that kind of thing to refine the people you're interested in Right, and then you're given a ranked list of people right these are people who have created a profile and uploaded Uploaded a resume that match your criteria right the same way that you might search for something on Google Right, so this is very interesting but also sort of troubling to think of this as a people search engine All right, and if you think about Google and how important being number one is in the rank right the same logic applies here These people at the top of the ranking are privileged right? They're the ones who are likely to be seen and head hunted by recruiters And so being ranked highly in this system matters It directly impacts your ability to get recruited and find employment now The ranking metric that is used in these systems is opaque right? It just says we rank them by relevance All right, but nobody really has any idea what that means right? That's up to the engineers of this company All right, the other thing that you'll notice is that there is essentially no demographic Information available here right it doesn't say whether these are men or women if they're black or white Right, and that's intentional when you sign up for the service the companies don't ask for that information They don't want to know your demographics because At the end of the day they would like decisions to be made not based on demographics right that would be discriminatory so to The greatest extent possible they try not to collect that and they try not to show it So on one hand this seems good right you have a system. That's not collecting demographics not showing demographics right and that seems to Like it would create a system that is demographically blind right it should be neutral to all these different categories and that should Try that should do something to help eliminate bias right if you're a recruiter and you're looking at this and you can't tell people's Demographics right then it prohibits me as an individual from bringing up my sort of internal biases Right you can think of this is sort of like a blind audition for an orchestra right historically Orchestras were incredibly biased right it was primarily men Right they instituted a system of blind auditioning right where you play behind a curtain No one can see if you're a man or a woman or black and white and this Had been that effect of dramatically increasing diversity in orchestras right it eliminated demographics from the decision-making Right, but here's the problem Right demographics aren't really gone. They're just sort of buried Right ultimately there is a ranking algorithm here and it's looking at things like where did you go to college? Right, how many years of experience do you have what's your current level of seniority in your company? And those are all things that are linked to demographics right different people have different levels of opportunity right there are things like glass ceilings right and all of those Systematic biases end up being embedded in the data Right, so if the ranking algorithm is using those things as features to decide who is relevant who is best That could in turn just Reflect these social inequities right remember Everyone showing up at the top of the page is privileged All right, so if the ranking algorithm is taking these things into account right and systematically moving some people lower Right, that's directly impacting their ability to find employment. Yes I just had a question about the previous slide How could it be that with the search being software engineer in New York, New York that you only returned? New Jersey people who are living in New Jersey So it was a 25 80s around New York, but as far as wouldn't it be privileging New York New York as a As a factor it should It's interesting that the top candidates happen to be coming from New Jersey in that case I'm not exactly sure what that signifies Something about the the prices of real estate in Manhattan Even for software engineers so this idea that Bias can creep into these these kinds of hiring systems. It's not hypothetical So at one point we ran a study looking at gig economy services like task rabbit So if you've never used this it's the kind of service where you can go and search for someone to walk your dog or mow your lawn So we were interested in whether there were demographic biases on this service All right, and the important thing that our study found is what there were All right, they come from the ratings from the users Right users hire these workers to do things and then they can leave feedback. Well that feedback it turns out is biased All right, we found that women and people of color systematically received lower ratings than white men right What that also ultimately meant was that the search engine produced biased results because it ranked people based on their ratings Right, you have a bunch of people who will walk dogs. Who should be the top person? Well, it's the person with five stars Right, but they failed to consider the fact that those star ratings embed this bias from people generating the data right another case that was in the news right Amazon built an AI system to try and screen hiring applicants Right and unfortunately, this was trained based on their prior data from hiring which had a systematic bias in favor of men Specifically male engineers right and lo and behold they ended up with a system that systematically discriminated against women And so it's not at all hypothetical a system like this could have a bias right the question is does it? So in particular for this study, we're going to be looking at three search engines indeed monster and career builder And we're going to be focusing on gender bias. So are these resume search engines? systematically Reducing the rank of women or vice versa, right? Is there some kind of gender bias inherent in the ranking algorithm? We're going to be looking at this from two different perspectives, right? So it's these are essentially two different kinds of fairness All right. First, we're going to talk about individual fairness, which you can think of as equal opportunity Right two people with the same skills the same qualifications should essentially be appearing at the same rank Right once we've ignored gender right same skills means same rank equal opportunity However, there's another way to conceptualize fairness, which is group fairness, right? What that means is that overall right the average rank of men and women should be the same right? Regardless of skills regardless of qualifications right on average We should all be the same regardless of gender and you can think of this as roughly being equivalent to disparate impact Right is one group systematically different than the other group If we find that the system isn't fair with respect to gender Then we have a separate question, which is was this intentional right was the system designed to use gender Right is there direct discrimination or is this just an artifact of the data right? It's something that's embedded in it and there's proxies surfacing that is causing this So I'll go through a little bit of our data collection right. How did I get all this data? I'll talk specifically about how we analyzed individual and preparedness We'll answer the question was there or was there not direct discrimination going on here And I'll briefly conclude and then we can discuss and you can rip apart my study Any questions so far cool Okay, so this is an algorithm audit, which means that my Subjects are algorithms. That's a weird thing to say, but that's essentially the framing here So my subjects are the search engines of indeed monster and career builder And we chose these because they're extremely popular right as you'll see we get hundreds of thousands of candidates from these search engines And this is just a subset of the the cities that they operate in right they have millions of people We would have liked to do more like LinkedIn, but LinkedIn is very expensive and very litigious. So we decided not to look at them So on monster and career builder the candidate data is not public you have to sign up as a recruiter So we did that right you pay a couple hundred dollars and you get access to the recruiter tools That required some light bending of the truth And on indeed this is all public right anyone can go and look at the recruiter tools. There's no gating involved So with respect to my data needs essentially what I what I have to have our search results Millions and millions of search results so that I can analyze them all to figure out are their systematic differences between men and women Who appear in the rankings? So we ran a lot of different queries So we chose 20 different cities to run queries in and these were chosen for diversity right we want big cities We want small cities. We want cities that are demographically representative of the US We want cities that are majority minority right we need diversity here So in every one of the 20 cities we ran searches for 35 different job titles And these were split between high and low skilled positions So the high skilled jobs are things like software engineers accountants Registered nurses right things that require accreditation And the low skilled jobs are things like mail carrier or taxi driver or customer service Right things that you could get with let's say a high school diploma, right and again We do this for diversity right we want things that require college education that don't require college education, right? So we run all these searches on these Recruiting tools right what is the actual data that we're getting back from them? All right, so we're getting search results that look like this So right off the bat you get things like the candidates current job title and position Right, so are they a software engineer? Are they a senior software engineer? Those kinds of things you get their experience in education Right, they have a college degree high school degree doctorate And then on indeed specifically we also got full resumes right so for the people on indeed We actually have much much richer data Unfortunately, we couldn't get that from monster and career builder So in total on indeed and monster we had 13 different variables per candidate and on career builder We only had six this you'll see this is sort of a theme career builders the weakest of the three platforms that we studied So this is good, but I'm missing one critical thing. This is a study about gender discrimination Where are my genders coming from? So what we did is we inferred people's gender based on their name All right, so fortunately, we're running this in a Western society where first names are heavily gendered All right, so you can go and get the US baby name data set All right So this is the names and genders of everyone born in the United States for the last n years You can use that to calculate for a given first name What is the likelihood that it is a man or a woman right is it a male name or a female name? So essentially what we do is we assign a probability of being masculine to every candidate in the search result All right, so I have three lines here one for each of the services I have the probability of masculine on the x-axis with one meaning. I'm really really certain that you're a man And a zero meaning. I'm really really certain that you're a woman Right and you can see that the distribution is essentially bimodal right for most of the candidates like 45 ish percent Their name is heavily gendered right. This is someone with a name like Michelle Right, it's highly unlikely that this is a man All right on the other end of the spectrum you have someone with a name like Christopher What's that? Yep, so Chinese names are a huge problem because they're not inherently gendered So this weird artifact in the center These are essentially foreign names or names that just never appeared in the US baby name data set So around eight percent of the population. I don't know what their gender is So that's something to just keep in the back of your mind when I'm running the analysis, right? There's eight percent sort of uncertainty here for some of the candidates But by and large I'm pretty confident in our labeling here So the summary of the data set is as follows right 35 job titles each queried in 20 different American cities For each one of those we collect all of the search results So this is around a thousand candidates per search and this was all collected in 2016 So in total we had around 500,000 job seekers on indeed 260,000 on monster and 68,000 on career builder So we have tens to hundreds of thousands of individuals to try to put into the models and understand what's going on So now let's go back to our research questions. First. We're going to talk about individual fairness All right, and this is essentially equal opportunity right due to people with the same Qualifications essentially show up at the same rank regardless of gender Then we'll get into the group fairness right so is this men and women on average have the same rank right? Or is there something like disparate impact going on right a systematic difference in men as a group versus women as a group? And then we'll get to the third question which is was this intentional? So you already that you have sort of a preview I must have found something or I wouldn't even be asking question 3 So we'll start with individual fairness Right, so the idea here again is that if you have the same kinds of features right the same experience the same education The same kind of current job position you should be showing up at the same rank right? So to analyze this we're going to use regressions The dependent variable in my model is going to be rank right because that's what I care about right is the rank between these candidates roughly the same Right and to do the modeling we're going to use a mixed linear model right without getting into sort of the details You know the reason we do this is because as you go across job titles or across cities Right there are just variances in the skill sets like if I'm comparing Software engineers in Mobile, Alabama versus software engineers in New York City, right? They're just systematically different right there different locations So this kind of a model takes that into account right it makes the comparisons across jobs and across places more fair Okay, so jumping right into what we found right we did find a statistically significant difference between the ranks of Male candidates and female candidates Right, so I am confident that what I found is a real effect but We have to ask ourselves what the effect size is Just because I found something that's real doesn't mean it is like actually important in the real world So what I'm showing you on this graph is the difference in rank Are essentially the the amount that the ranks increase for male candidates as you go down the list of results Right, so if we're only considering let's say the top 20 candidates that come back in the search results What is the advantage being conferred to male candidates? Versus say if you go down to rank 100, what is the relative advantage being converted to male candidates? So if we focus on just the top ranks There is an advantage to men, but it's so small that it's almost imperceptible Right, if we're just talking about like the top 10 results Men get a slight advantage, but it's less than one rank Right, so essentially it's meaningless All right, this difference only becomes meaningful when you get deeper and deeper into the results by the time you get to Let's say result 50 If there's a man and a woman with the same qualifications the tie will be broken in favor of the man And they will get a higher rank All right, and this continues to grow as you get farther and farther down Quick question So did you take into account any differences in proportion of men and women on this list because you know if you had 10% women then it would be expected that they have lower rank on average Yep, so we have to because like for like software engineers, right? It's hugely imbalanced in terms of the population. So this is all normalized to the relative proportions in the population So this is sort of a mixed bag, right? Yes, on one hand my models say that there is a Systematic advantage being conferred to male candidates here, right? That sounds like discrimination But the effect size is very small And so whether this is actually a problem in the real world We don't know because we don't know how people use this system, right? Does a recruiter only ever look at the top 10 results? Maybe this doesn't matter But if recruiters look deeper in the lists, right? They look at 50 people or 100 people then I would argue this very much matters and this is something we should take very seriously So this is individual fairness, right where we're considering people's qualifications What if we switch gears a little bit and talk about group fairness, right? So just in general our men and women treated the same Right. So for this all we're looking at is just the distribution of ranks for men and women and we're comparing them directly And what we found is that in most cases the results are group fair, right? The average rank of men and the average rank of women is the same for most of the job titles that we searched But not all there were a couple cases where there were systematic differences And most of these unfortunately occurred in engineering fields, right? So for things like software engineer mechanical engineer network engineer Somewhere between 8 and 13 percent of the queries we ran Men appeared on average at higher ranks than women even taking into account that women were just a smaller fraction of the population In all of these cases the unfairness favored male candidates We didn't see a single case where they it was unfair in favor of women So this is pretty concerning, right? I'm a computer scientist and I cringe when I see this, right? We know we have these huge inequities in the tech sector and that you just see it like being reflected right back at you by this system So this is pretty unfortunate in my opinion So our final question is is this intentional or not, right? Does the ranking algorithm use gender as a feature? Explicitly, right? Or is this something that's just coming up naturally because of biases embedded in the data So the way we're going to attest this is through controlled experiments Right, so we made it accounts for job seekers and we uploaded resumes to see how they would rank and then we would change Their attributes to see if it would change the ranking Right, so here's a quick example. Let's say that these are the original search results, right? So that's the top person. That's the bottom person I upload two resumes from Joe and Bob and I record their rank, right? This is the order that they appear in Then I go back and change Bob's name to Amy Right, I essentially flip his gender What I'm looking to see is does that cause the ranks to switch? If the algorithm is considering gender This is what you would expect to happen, right? Because that's the only thing I've changed all I've done is change their name So we ran a lot of controlled experiments. I'm a bunch of different variables, right? We tried doing things like changing the length of your resume. We changed how many keywords or skills you had We changed where you went to school and The results were sort of all over the place right like on indeed none of the tests mattered right nothing impacted the rank on Monster Two of the tests actually mattered, right? So if we changed where you went to school it did flip the rank Right similarly if we said I'm currently employed or I'm not currently employed right that also flipped the rank Right on career builder There were also two features that mattered whether you were currently employed and how much job churn you had right? Have you been in a position for ten years or you have you had ten jobs in ten years? But the most important thing is this top row right when we change the person's name to change their gender It didn't impact the rank at all So what this demonstrates is that the algorithms are not trying to infer your gender and re rank you based on it So it does not appear that there is direct discrimination here Right, there's no attempt to determine who you are and then re rank you based on your demographics Okay, so just to quickly wrap up right there are some serious limitations to this study right We are using inferred gender based on name right that is not the same thing as someone's true gender Right, so that impacts our analysis also. We have a very simplistic view of gender right It's we're just treating it as binary, which of course is not great There's questions here about ecological validity right we can get into the details of whether I modeled this correctly We can also get into arguments about whether these results matter right do recruiters actually use the tools in the way That I've kind of assumed in my analysis Right, we don't know because no one's ever studied their behavior One sort of unsatisfying thing about this work and algorithm auditing in general is that you can't really do a lot of causal Explanations here right I showed that there's no direct discrimination based on gender going on But I don't have a satisfying explanation for you of why there was unfairness Right, it's there and it's likely being caused by some kind of of you know proxy embedded in the data But we don't know what or why And then the last thing you know we did these controlled experiments, but I can't possibly argue that these are comprehensive Right, we tested seven or eight different variables, but there could be tens or hundreds more that we just haven't thought of So to conclude right I would like to just re-emphasize that we don't see these companies doing anything intentionally discriminatory Right, so don't start filing your lawsuit Right that said right there is individual unfairness here systematically in in favor of men against female candidates But the effect size is very small right whether this has any kind of practical impact on the real world is unclear With respect to group fairness right again, we saw problems primarily in technical positions But this is again sort of an emergent characteristic of the data not an intentional decision to rank people based on demographics So this is the reason I have the glass half full here Right. This is very much a mixed bag set of results So I'll stop there. I would love to have a chat about all these things a couple of interesting questions You know so one is should hiring systems or even just algorithms more broadly Do more to combat these kind of pre-existing social biases right if we take it that there's problems with data Is a responsibility for companies to take that into account when they engineer their systems If so, what kind of fairness should they be going for right? There's very elegant results saying that in most cases you can't have individual and group fairness at the same time That means you have to make normative choices who should make those choices Another really interesting thing is that in order to make these systems fair you have to control for demographics Right if I know your gender and your race and your religion and your political affiliation and blah blah blah blah I can control for them and normalize it and build a fair system for you But that requires people to change their behavior They have to be willing to tell us those attributes so that we can input them into the system and do the normalization Are people going to do that right? Would you be willing to give all that sensitive data to a resume search engine or a bank? Right under the promise that they're going to use it to make things fair and not just use it to discriminate All right Not as much as you'd think we tried so originally this was conceived as both a gender and race study and the Inferring race from name actually was terrible. It was really bad So it it works in some sort of like obvious verticals, but like African-Americans It's it doesn't really work. So the whole thing was just sort of bankrupt Yeah And then there's sort of questions about me Right, I'm an algorithm auditor and I think this is necessary. I think people should be held accountable I think should there should be transparency But there's debates about whether this is a thing right should I be allowed to do this in violation of someone's terms of service? Right, what gives me the right? So with that I will shut up Thank you so much Christo So just to keep some order here Maybe people can like raise their hands and we've got a couple of different mic runners And we'll kind of go around the room and if just raise your hand so that we can keep an eye on you Hey over here Quick question. Oh, yeah, I'm moment Malik. I think we've met at ICW. So it's good. See you. See you too fellow here at Berkman Klein if you didn't control for the base rates Then how much does that affect it? So I mean we're looking at controlling for that But if we didn't if we looked at the quote unquote pipeline problem, then how much worse is it? Yeah, so our baseline here is just everything we saw so like first let's say software engineers Right, we look at all those search results for software engineers and we use that to estimate what the true population of software engineers is But as you said, we have no idea how reflective that is right. We don't know who uses these services We don't know what's coming out of the pipeline So I mean the short answer is I just don't know and Maybe you can use something like the you know, ACS data to try and get a better baseline But I really don't know I mean we're starting to get into like very narrow sort of verticals here my question is with how about in your opinion of All the work that you've done would you recommend To a woman to change her name to get advantage or at least well How about using your initials is that assumed that that's a real woman and Also, presumably the other way around you wouldn't want to have a boy named Sue So this is a fascinating question before I started this work I had never heard of resume whitening before but this is very much a thing right if you are a person of color You will do things to make your resume seem more like a white man, right if you're Muhammad then your M something right you put your initials So I guess whether you want to do this depends on who is your adversary is it the algorithm or is it a human recruiter? So if your adversary is a human recruiter, right you're worried about human being social biases then I think whitening or Masculifying or whatever you want to call it. It's probably a viable strategy Because we have a pretty good understanding about the the social biases of people With respect to if instead your adversary is the algorithm The algorithm doesn't really care about your name or those things Instead it's looking at things like what college did you go to so if you went to let's say a historically black University a Recruiter might miss that But the algorithm doesn't and now it's a question of do you lie about where you went to school? Or do you just say I have a college degree without saying where it's harder to change a lot of those features without Like systematically disadvantage yourself in other ways So this is it's a really complicated question because The way that these these individual attributes Link up to demographics are so complex getting rid of them entirely This is what machines do best that they find these patterns Hi, you're to your left. Hi. I am Paco and I'm a Harvard alum from like decades ago Very interesting your project and I wanted just to say obviously because it hits me But more and more people are gonna be coming on to the you know ranks of the older generations And if you go in this if you keep going in this in this venue, I wanted to ask you, you know, how do you? You talk about gender race sexual orientation. What about age, right? I mean H is such an automatic thing because just years of experience Year of graduation and it'll automatically be there. How are you going to pre-empt? I know the worst this is going to be so difficult for for you to avoid this bias against you know Against all of the elder who still want to work anyway. Thank you. So this is an extremely cogent point There have already been lawsuits against these companies for age discrimination Because that's an obvious thing to do ranking on and it's obviously discriminatory So I mean fortunately Because age is intrinsically linked to like years of experience. It's easier to measure and control for So in that sense, it's you're a little bit better off because it's a visible attribute So it's easier to try and and remove that kind of issue from the data But then you run headlong into people's expectations of the system, right? I did this search I kind of expect the most qualified people to be at the top or whatever and now I'm explicitly like trying to discount that in a way and Now it's it's weird the results. I'm getting are not what I expect So there you're running headlong into kind of these interface design issues and like we don't know how people will react But I agree. I mean, this is like systemic problem Got one two and then three so why don't you hi Hi, I'm Raj Pargav over at Microsoft Research I had a question kind of relating to what you mentioned earlier About how this is just one step of the process and people interact with technologies in various ways So for example something I can imagine here is if I'm a recruiter and I query this I can just ignore all the female names no matter where the ranking is Yep, but obviously, you know, you're very involved in algorithmic auditing and see you think it's valuable. So I guess Two questions. One is what do you think is the value of doing this from a technical perspective? And what can we gain from purely kind of your field and the research you do and then two is how do you think? Computer scientists and other people doing this work can work together with anthropologists and other humanists who study how people Actually use these technologies and what do you think is the best way for that collaboration and overall? Yeah, so with respect to kind of the CS stuff, I mean Part of this work is just about activism and awareness, right? I do this because it's context that people haven't considered You know, there's decision-makers who aren't aware of these issues I keep hammering on this because I want everyone to think about this and start implementing like processes to deal with it Another interesting outcome of this is just metrics for trying to measure things So actually, I mean, I'll admit that I'm I was pretty unsatisfied with the methods we used in this study to look at rank And we've developed now New methods we just published a paper on for doing much better assessments of like group fairness in these ranked contexts Those metrics didn't exist before Or I'm thinking of things like, you know the pro-publica compass data set where like no one had thought about some of these these axes for Disparity and now there's tons of machine learning work that tries to equalize, you know for error rates and things like that So I think there's there's a lot to be said for just You do this you expose issues and then that then leads to kind of a virtuous cycle of better algorithms and better tools So then with respect to anthropologists I mean this has to happen right I'm a I'm an engineer like a low-level engineer by training this a Work just kind of arrived to me, right? I didn't set out to do this and frankly I'm not super well equipped to grapple with with these these sociological issues So working with people outside of computer science is super important and there I think there's a huge appetite for it if the CS people are willing to to speak the language And move it a little bit of a slower pace and there's really fruitful collaborations to have that happen there And I mean frankly that the sociologists and the anthropologists just know more about these issues Hi, I'm Stavella, and I'm a fellow at the Berkman Klein Center as well I am wondering on how you selected those categories to test for like university education and so forth and I'm wondering Did you get the chance to maybe look at the network request and see like how much time like what are the variables that they're expecting That indeed is collecting as you click on the service and scroll down to sort of like get more categories that could be used for Determining rank. Yeah, so our for our controlled tests The selection of those variables was entirely ad hoc and we basically just went through a couple hundred resumes and said It looks like these are kind of the common denominators But this was not systematic in any sort of of way So with respect to the data they're collecting I Don't think that they're doing online learning so in the sense that like the click data is actively driving the ranking algorithm So we controlled for that in our tests because we didn't know so everything was measured in sort of lock step Because I didn't want clicks to influence But that said I have no idea if they've taken a historical data set of clicks and used that to train the algorithm You know just because they're not learning right now It doesn't mean that that's not where the bias came from Originally, like you know, you mentioned a recruiter just ignoring the female names that could easily Produce this kind of an outcome and I don't know because I don't know what their training data was But yeah in terms of the nitty-gritty there wasn't an awful lot of like PII being collected There was nothing sort of obvious of like I'm trying to infer an attribute about me to then drive like a personalization algorithm There's none of that. None of these are that sophisticated. I Have a you made a comment earlier that was that was I think hold your lawsuits They're not these these algorithms are not intentionally doing anything wrong So I guess I'm curious what your thoughts on intent or not if your system at the end of the day has a certain results Should they be held accountable? So I absolutely think they should be held accountable. I guess what my off-the-cuff holds your lawsuits argument It was more sort of in the vein of like You could do it, but I think it would be a challenging argument to construct You know does does what we found to reach Existing thresholds like, you know the 20% rule for disparate impact. I Don't think it quite does but that's just my opinion And that does not rule out, you know novel legal constructions here But the other thing is I Am absolutely in favor of accountability But I'm also Troubled by the I do a lot of accusing right I measure someone and I say this is bad But I don't have a commensurate amount of tools on my side to help people fix things We're starting to see more of that But it is a little bit disingenuous frankly of me to be like look at all these issues And then it's like okay. Well, how do I fix this data when I don't know people's gender? Well, actually, I don't know I don't know how you can fix it So there's there's a lot of things that have to change to really like Eliminate these issues some of which may be lawsuits This is a question that will probably expose my ignorance But if I understood what you said in your analysis, there was no conscious Gender bias But it's clear from your data that there is an unconscious gender bias. Mm-hmm Now is that what's in the data which you can't get at? That's the background to the algorithm. Yeah, that's the that's The most likely explanation In which case if you can repeatedly demonstrate this Why can't you make a case for changing the data so Changing the data is complicated so I mean what one Avenue for doing that would be changing people's behavior, right? If the data is just a representation of society like where people go to school and Who decides to hire men and women for technical roles? Let's say like changing The social processes that ultimately produce this data is super hard, right? We've been trying to eliminate discrimination from society Forever, right, and it's not gone yet So, you know really like attacking the root of the data generation problem is Equivalent to attacking the root of discrimination in society Now we can take a less ambitious Take on it, which is we know that society produces data that has biases But we can just take this pile of numbers and try to fix it But that requires having more data, right? If I want to take that pile of data and fix it with respect to gender I have to know people's genders Or if I want to fix it with respect to sexual orientation, but I have to know people's sexual orientation So I can quantify it and control for it and I can do that, right? I have the tools But the question is are you willing to tell me those things? Will you tell me your sexual orientation or your gender or your race? You know, are you willing to do that in a healthcare context? Are you willing to do that in a banking context? Are you willing to do that in a police, you know, enforcement context? Because if you are then I can fix things But if you're not then I'm sort of an alert so Fixing the data is complicated. You don't seem entirely convinced by my answer Yes, yeah, so that's a great question for any company that's using data uncritically You get the sense in Silicon Valley that data is assumed to be neutral, right? Then I can computation is assumed to be neutral and it is not it is inherently unfair So if you're choosing to use it you are choosing to buy into the status quo By your comment about whitening the resume or the CV. I think progressive Corporations are certainly saying look we are working within a global economy and We cannot afford to have everybody in our workforce to be of that same Demographic paradise that has existed for hundreds of years if we are going to sell and be progressive and to be leaders in our market We have to recognize diversity and take the algorithm and search for women Mm-hmm and search for persons who are Chinese Japanese Asian African-American and in deliberately bring them into our workforce because they can help us in our analysis of the Market as well as in our productivity and profitability. Yep So I think that's great, and I applaud any company doing that So for example, I know LinkedIn has done a lot of work to try and diversify their search results intentionally right by it by Seizing on exactly what you just said But there's counterpoints. So for example on monster at least when we looked Before you pay the money to get a recruiter account There's like a demo tool and the demo tool lets you do things like search by diversity Right, so the if you're a recruiter right you can intentionally do that But when you actually pay the money and get the real tool that option isn't there anymore Which was very surprising to us Nothing like going public and telling the those who are looking for jobs a monster allows us to bring on diverse employees But once we become a part of the organization They no longer give us that opportunity So I just don't think we should have to hide We are because once you walk into that door and they get to see that you're not Todd or that you're not W so-and-so last name. Yep. The face goes red and thinking. Oh my god. What have I brought into my office? So I just think about being upfront and being honest with people from the day one. Yeah, no, I agree but I'm also I Am not disadvantaged So I feel like it's not my place to say what people should or shouldn't do But I mean ultimately I completely agree with you right we keep people can't hide right that doesn't solve anything Hi, my name is Kate Coyer I'm an affiliate with the Berkman Klein Center and I'm actually asking a question for a colleague who saw me on the The video in the room and he's watching Ramoli. So there's that But it's it's related, but it's just it's a three-part question, but it's specifically related to the data collection I mean one as you know the tech sector is heavily recruiting You know immigrants from different countries and how do you take that into account when you can't identify the genders in those names? I know you mentioned the 8% of question mark, but the specific nature of that if you could Address of you account for that. Yep, and then also in your Did you collect your ranking data on the same day because of the way our companies update the algorithms on a daily basis? How do you control for that? And then finally, how do you take into account that? the Disproportionate gender in certain fields like computer science and how does at versus a nursing and how do you make the comparative analysis there? Thanks. Yeah, so with respect to The foreign names that we can't classify so in the actual paper we ran Probably a dozen different models Some of which are a propensity score matched, right? So we're just looking at the people who can be scored with high confidence in male women pairs Which addresses kind of this issue that like some of the data has more unknowns And those models look exactly the same as the models with all the data. So I think we've addressed that With respect to the unequal base rates, I think that was question three. So like computer science, right? There's just fewer women In all those cases, you know, we're normalizing by the population distribution So for example in the group fairness case, you know, if there's 80% men and 20% women That's being taken into account, right? We expect the the overall Distributions of the two to be the same even though one of the distributions is sparser, right? Because it's just fewer women Now in some cases if there's like way too few women, you just can't get a statistically significant result But again, we're just in those cases. We're throwing it out, right? We Because we can't say anything And then with respect to the algorithm changes. So that's question two So because of us, you know, trying to be kind to the services, the data was collected over roughly two months for each one So if there were changes happening that would be embedded in the data and there's not a lot I can do about that I mean, I guess we could take the data and stratify it by when it was collected to see if there's like a Systematic difference between month one or month two or week one and week two If I was looking at linked in I would be a lot more concerned about that because I have the preconceived notion that they're a lot more technically sophisticated These companies are not Indeed might be technically sophisticated monster and career builder are not technically sophisticated. I would I Really don't think they're doing a live algorithm updates Another question. I wanted to sort of Make a follow-up add on to a previous question that was raised over there Considering that Systematic inequality is usually due to a process of artificial inflation So where you have a system that prioritizes certain groups at the expense of other groups So when there's a gap now or discrimination Which was caused by artificial inflation? As a way forward Do you think that these ranking systems or these companies should also artificially inflate their results in order to make up for that gap? That's you know being caused Yeah, so I very much agree with that So in terms of like individual versus group fairness, I am a favor of group fairness You know in it in the American sort of parlance. This is affirmative action So, yeah, I mean that's exactly what you would do You would take the disadvantaged group and you would inflate them or increase their ranks systematically to compensate for this This social process that systematically depressed them in the past. I think that's what people should do I think that's the goal we should strive for but I'm just you know an ivory act every tower academic so Could you elaborate a bit on that second point and why it is so difficult to increase both Individual and group fairness at the same time and what do you picture by some point in between? Yeah, so Depending on the base rates of a given attribute in the population You can show that you can't have both So let's have a hypothetical example or let's say I don't know all I'm trying to come up with something that's like not offensive All people who smoke cigarettes will also go to community college All right, and let's say that you want fairness between smokers and non smokers If the base rates of education Between smokers and non smokers are just systematically different like bipolar You know individual fairness will say The non smokers always go to the bottom because they only have community college everyone else has full college That's that's individually fair But if we say we want group fairness then the two things should be interleaved right it should be a smoker and non Smoker smoker non smoker smoker non smoker And that's incompatible with the but I'm trying to sort based on education So if you have this kind of situation where the just the base rates in the population diverge you can't have both So that gives you sort of a knob right you we can say we want individual We can say we want fair or it can be somewhere in between and What that means is sort of unclear because now it's not it's not obvious what you're optimizing for Is it I want to be Group fair within kind of some thresholds Let's say like the disparate impact threshold in the law and then within that kind of error range I can then Resort people by individual fairness Or is it something else entirely like I'm optimizing for User experience like to match their expectations It's unclear it's that's an ill-defined point in the space Hi, my name is Martin On at least three occasions that I can recall off the top of my head. I attended sort of public And I'm a member of the public Sessions here a couple of different Harvard institutions colleges or and There were three references to a book by the title of weapons of math destruction I later attended a talk by the author of the book Actually at the Microsoft Nerd Center while he stepped out and during her talk She referred to her publishers directions guy. You might say marching orders were that there should be blood on every page One of the sessions that I attended a but I believe a Harvard undergraduate His reference to the book was that It is required reading at Harvard So I guess you see where I'm going with this The you earlier and answer to one of the questions I think I heard you decide with a little bit of sort of humility that we're just not there yet We maybe just don't know everything about this technology and You know how to solve necessarily every problem every question But I would suggest that the author of weapons of math destruction wrote a book that You might come away from that book sort of feeling like you know what if we can just get these amoral engineers a Little bit of morals will solve everything So as as an educator I can tell you there are plenty of amoral engineers coming out of our institutions who could absolutely use some ethics training Which is something I'm working on on my sabbatical actually But yeah, I mean your characterization of weapons of math destruction It's it's similar to how I feel about black box society by like Frank Pascal. I mean, it's it's Very much gloom and doom But but I see the need for that right there's kind of an overton window of the debate around Technology and society right and I think we need people on one end really pushing for these are the terrible things that could happen If we're talking about you know police adopting software to you know determine bail and do predictive policing I mean that the social consequences are so pernicious. I Feel like somebody has to get out in front of that even if the worst case doesn't happen just Elucidating like the worst case could happen I think that's really important right that serves a really important function So in terms of you know making weapons of math destruction required reading. I think it's important You know even if it is a little hyperbolic just to say like These terrible things could happen right and you're gonna go out in the world and build systems and use data and collect data Right you have to have this on the back of your mind. I think that that serves an incredibly important function I'm Kathy's is probably the best in my opinion. I mean there's a lot now Meredith Whitaker has won Sophia Noble I Mean that there's there's a bunch and they all have you know slightly different angles and I think they're all valuable But I would say Kathy was kind of their first and I like her book a lot I think we have time for one last question if there's one last burning question if not may I'll ask a question which is The methodology that you've talked about about auditing systems I think is really interesting applied in this particular context, but if you were to abstract it to other areas what you have done How should we think about auditing algorithms as a research community? And what do you think people should be doing more of in terms of interrogating these types of black box systems and Ways that we can do that from being outside the companies Yeah, so there's a bunch of things You know right now the auditing community is kind of small It's very academic and our tools are garbage frankly right we build these things on the fly I think there needs to be a more sustained effort across academia and investigative journalists Foundations to build auditing infrastructure and part of that has to be in franchising people It can't just be me you know tickering with JavaScript and my students Giving people tools that they can record data donate data see the results of data You know give them little knobs to tune where it's like here's the newsfeed algorithm yesterday and here's it today Right, what are the differences? Do you care about those differences? I think that's really important The the complexity of the systems and the speed at which the systems change has to be matched by an equivalent auditing infrastructure and building that's going to be a monumental effort The other reason that needs to happen is eventually You know NSF support for auditing will will run out right now There's like new science here. How do you measure things? But eventually that that's not going to sustain the work It has to be sustained from outside There's also a lot that has to be done the policy realm Companies don't love this and there's potential legal implications to this which is is discouraging in any number of ways So normalizing this as a practice right the way that security and white hat hacking have been more or less normalized I think that's really important You know, so whether that's just changing corporate culture changing policy changing law, you know, I Think there's there's potential remedies in all these realms Great, so thank you once again Christo