 Hello, welcome. This is Brian Rowe from LS NTAP, and we're really happy today to be doing a new webinar. This is the one that we have not done in the past. It's on data ethics when designing civil justice intervention. It's being put on by the Florida Justice Technology Center, who has been a great partner for us to work with this year. And we've got some really good speakers. I'm going to turn it over to the Florida Justice Technology Center to take us through this. And I'm going to have some announcements at the end with our upcoming webinars. Thank you so much. Great. This is Willnita from the Florida Justice Tech Center and with also a fellow at Data and Society Research Institute. The speakers today, I'm really excited to have them because they're kind of at the forefront of these big data discussions. They're engaging at the national level with an academia, with the private sector around issues related to the implications of big data, in particular to civil rights and low-income populations. So we're really lucky to have them come and speak to us and kind of share their insights and their tips as to how all of the discussions and all the things that they're learning through their work how the civil justice community can take that and and use it as they're building their own technologies and using data. So we have Salon Barocas from Research Associate at the Center for Information Technology Policy at Princeton and he will be speaking first and then we have Ali Lange, a policy analyst at the Center for Democracy and Technology's Consumer Privacy Project. And then you have me and the Digital Officer at Florida Justice Tech Center and a fellow at the Data and Society Research Institute. And then the agenda for today is that I will introduce the issue just a kind of general overview of data ethics when designing civil justice interventions. Salon will go into a really interesting conversation about how machines learn to discriminate and he'll kind of unpack some people might be wondering what exactly that means and he'll unpack that term machine so as it's related to data. And then the topic to Ali will be presenting will be digital a framework for digital decision-making that programs may want to use and then we'll hopefully have five or ten minutes for any questions that folks have. So just as a basic introduction a lot of you may be asking what exactly is big data and just so we're on the same page big data includes many types of data. It's whether it's structured semi-structured or unstructured. It can and I have a sense you have a typo there. It can be from traditional digital sources inside and outside of your organization. So what makes it so large and so encompassing is that when we think about big data, we're not only talking about the data that exists in an Excel spreadsheet and the data that your organization may be collecting about your own programs. We're talking about as well the census data, the various community sources of data that may be out there, the courts data, various policy programs that may be collecting data and things, you know, not only when we look at unstructured data, we can we're looking at text things like emails, things like Facebook interactions that you may have with your clients and Twitter or Facebook or social media. All of that is data and and so we're not only look when we talk about big data. We're not only just wanted to stress. We're not only talking about Excel based structured data. That's structured data. We're talking about all the various avenues inside and outside of your office where information is being collected about clients or about yourself. And so not to go into the a fear factor necessarily, we're all beginning from a starting point where these vast amounts of information that we are creating about our lives and documenting, it has the power to improve our lives and I think a lot of groups are excited and interested in that and all of us here are as well because it often does. That's not the problem. The where the problems arise is when big data is being used, absent of human touch and a single-minded focus on efficiency and that can lead to troubling patterns which can isolate groups that are already at society's margins. So it's sort of when we start taking data, these vast amounts of data, building models, predictive functions and judgment calls, using many pots of data that without really checking it carefully for various things, that's when it can lead into problems and it can create conditions that can be harmful, especially for the communities that the civil justice community tries to serve. And so kind of wrapping some of that up, a lot of people in academia and we're lucky to have Salon because he's one of them that studies this, is that research has found that big data in the analytics they can discover useful regularities in a data set that are just pre-existing patterns of exclusion and inequality and they can also inherit the prejudice and bias as a prior decision-making and a lot of that it's maybe hard to grasp and I understand that, but it's essentially what it's saying is that some of the patterns, some of the when we put variables together and we start to make patterns and discover patterns in the data, we often don't realize that some of those patterns have built in prejudice or biases of the people that kind of created some of the framework for putting those assumptions together. So this is where a lot of the issues arise and the speakers that follow will explain that a little bit better and then ultimately another tenant that comes when we think about using big data in a civil justice context is that we should all accept that we all have biases and once we start from that front of of we all have biases built into our decision-making, then we can better understand and the importance of when we work with data, being aware of those biases and knowing and starting from a place that I need to look at this carefully because we're all humans and we all have hidden biases and sometimes they can creep in in ways in forms like predictive analytics that we might not realize and so that's what we need to check. So what what do we do now? Essentially now big data is out there. Everybody's excited including myself. I'm constantly looking for how to use different data sources to gain insights about the communities we serve. So what what do we do now? As far as making sure that our use of data is effective and useful and it's not detrimental. Quite simply the speakers that follow will talk about other specific things you can do, but I like to start from a place of just developing a plan and making when you work with data putting it in the forefront always to have a personal responsibility to yourself and to your clients to always look at the ethical security and a privacy issues that deal with data. If that is the single most thing that you get out of this presentation, I will consider that a success just for people to simply know to tailor that excitement with big data with caution and awareness of the various pitfalls that are hidden in how we use data and just a few quick things about what that plan would look like and the infographic that is up there as a handout kind of has a more elaborate overview of this plan but I'll just quickly go over a few of the points. Step one is just knowing your data, knowing the data that you're using. The Federal Trade Commission, which is a lot of you may have heard, they're very active in this kind of big data and civil rights front and so they've issued a few papers and they recommend for looking at things like quality accuracy and usability. I'll just point out a few things. Accuracy is a big one. A lot of people may want to be able to understand things like the eviction rate of a certain community and they have a specific small set of data. Maybe they're just using data set from their legal aid program to analyze that and they feel like they can make conclusions about what is really taking place in that area and people have to step back sometimes and say, I don't have a data that's representative to be able to answer that. I can, looking at a very limited data set, this is the only insight that I can gain. So it's just being realistic about the data that you have and not trying to do any kind of over looming predictions and then usability is making sure that the people that are making decisions or analyzing the data are trained. Are properly trained and are aware of all of these kind of nuances as well. Step two is to examine data sensitivities for communities that you're hoping to serve and there's been a lot of discussion on the LSN TAP listserv already about this especially in regard to the LBGT community and people thinking about if you're hoping if you're if there's specific communities that might be adversely affected by your data collection, you know, reconsider what you know, do you need all of that data? and and always thinking about the ramifications of collecting the types of data and using it to do using and analyzing it and how that might affect certain communities in a bad way. So that would be the step two. The the step three whoops is that there's there's actually existing laws consumer protection laws that are increasingly applicable to big big data practices and this is a collection of some of them the federal credit reporting acts and there's several equal opportunity laws and as well as a you know, you might also be in a state that may have more progressive laws or local laws, so you might you this kind of getting a lay of the the legal the legal environment that you are your big data analysis work has taken place in is is a really good way to kind of know the historical presidents as well of and the issues that have come up, but just kind of also help you to to gain to to gain some insight into some of the ways that your clients should be protected when you're using your data and finally to try to include and empower your your clients it's useful to and this is a very new thing, but just to to think of ways of making your clients aware of maybe not your necessarily your own big data practices, but just the the existence existence of data in their lives and how your program may be using their data how other programs may may be using their data just to to build literacy because ultimately these are communities that Even though your program may be Doing all of ethically all of the right practices. We still want want them to to become on their day-to-day life just to become more simply more aware of of How they should be cautious about data So if there's simple ways that you can whether you put together a flyer that you distribute to your Clients of ways of helping to educate to them about their rights regarding people collecting their data I think that is a really good grassroots way to kind of build data literacy among a Community of people that really probably have no other way to get this information unless they They have older kids that are savvy or are they? Themselves, you know have an interest and and do their own research So just looking into the the future. I think there's a lot of changes going on in the civil justice Context and this this may be this is a this presentation is is a very introductory attempt to Kickstart what hopefully will be an ongoing conversation About data ethics because I think as a years progressed that with various state and national groups looking at predictive analytics and triage algorithms and Justice portals and expert systems and things like document assembly all of these questions are going to become increasingly Important and also that the context will continue to change the laws might change So this is this presentation is an attempt to take this very complex conversation that currently exists mostly in academia in a private sector and and hope and taking it and Putting it in the civil justice community in the hopes that people will The kind of preemptive some of the the issues that may come up in the years to come as a civil justice community looks to Build these more sophisticated kind of big data tools So as looking looking to the future the warm home to the future You know will the national civil justice community need multidisciplinary data ethics committee Institutional review boards to maybe an entity that can review The civil justice communities data projects to make sure that they meet several ethical standards Will programs that are getting Increasingly using data and into you know for triage algorithms or for their own portals Will you will we eventually need responsible data program managers in each program? These are these are questions that are open-ended. I don't think they're Applicable right at the moment, but there they are a place where we might be in the years to come So at this time, I'm gonna turn it over to Salon Barocas from the Center for International Technology Policy at Princeton University So that he can present Start his presentation Reminder if you've got any questions, please feel free to type them and we are watching the questions area here. Go for it Hi, thanks so much for having me here. I appreciate the opportunity and I'll Hopefully have a chance to follow up on a fair amount of the important points that will need already introduced Let's see. Hopefully I have Yep. Okay, great. I'm just checking to see if I have control over the slides So It may be very familiar to some of you given the work that you do How discrimination law works, but I thought this would be a quick way to provide some background before I dive into some of the more big data specific issues That are that are of concerns in the civil rights community so in particular What's I think an important distinction to draw here at the start is between the two doctrines in discrimination law And I do this in part because we often imagine that potentially People who are programming systems to make important decisions that affect people's life chances Might be affected by conscious malice or intentions to purposely discriminate And we then would have an existing set of law and disparate treatment Which allow us to kind of nicely handle those cases So whether or not someone is making a decision as a human on their own to discriminate or whether they've designed a program Intentionally to do the same thing the law should handle those cases in the same way But of course, there's also disparate impact doctrine which basically allows us to also capture cases where there may not have been conscious choices To discriminate but nevertheless the way decisions were made has a manifest disparate impact along the lines of race or gender or disability And here what's often important is to be able to show that there are these statistical discrepancies and outcomes according to things known as Protective classes, so these are things like race or gender And I introduce this here and in part to say that in many respects Big data seems to be a way to actually combat disparate treatment in the sense that Often refer to as an example hiring decisions Where people either are consciously prejudiced in the way that they make decisions about who to hire or they might more likely increasingly so now Be unconsciously biased in their assessments. This is something that is implicit bias And here the hope is that data and the move to Make these kinds of important decisions in hiring more data-driven could be an important vehicle to combat the persistence of discrimination and hiring But what I'll try to show today actually is that Unfortunately, there are some rather subtle ways in which using data to make hiring decisions can still result in avoidable Potentially avoidable disparate impacts. And so what I'll try to focus on here is Whether or not the existing doctrine of this impact can still be a useful tool in trying to make sure that Even when decisions are data-driven that we have some legal recourse in the event that they are discriminatory But let me try to give some substance to Some substance to This idea that you know data-driven decisions can be discriminatory So there's some important examples which are which are also cited in some of the materials from the FTC That was mentioned earlier, but I'll just walk through them and give you a bit more context So this comes from an important article like Kate Crawford wrote a few years ago About an application that was developed by the mayor's office in Boston known as Street Bump Street Bump was a very clever idea. The hope here was that the city was not especially effective in locating Potholes and all the puddles in the city and some people realized that perhaps they could crowdsource the problem They could have people install an application on their smartphone Which would take advantage of the fact that it has a built-in GPS chip But also has an accelerometer which would basically recognize when people are potentially driving over on even road So if your phone shakes a lot that might be an indication that that's the location of a pothole And then the phone would automatically in the background without people having to do anything Report that back to the city. So this was a sort of clever way of trying to you know rely on citizens to contribute to the hard work of detecting potholes and This was an important example for Kate because what the city quickly discovered is that there was a significant bias and the data that was coming back in particular This is what social scientists refer to as a reporting bias. So you might anticipate that not everyone in the city Owns a smartphone or even less surprising that even if you own a smartphone only some small set of people would actually even know About this application and decide to install it and so the data that was coming in tend to skew very much toward wealthier Areas or areas in particular where there were young professionals and so the Data that the city was receiving was it really representative of the entire population and in particular Probably underrepresented it significantly the poor areas of the city And so if the city were to actually begin to direct its resources to addressing these problems it would actually probably have the perverse result of Focusing more attention on those areas that are probably already relatively well off. And so this is just an important first Point to say that the data sets we use to make important decisions can be affected by the ease with which they With which data is captured or the ease with which information is reported back And it's it's a tricky problem. Oops Okay It's a tricky problem because often times the data we are able to collect is biased in Systematic ways and it may be for reasons like I just mentioned that certain people potentially are less involved in the sort of Data generating activities that would be necessary to Produce these things or they could be less involved in the formal economy So think for instance of people who don't own credit cards Well, they could have less access or maybe you're less fluent in a technology that produces these kinds of trade digital traces Or another possibility which I imagine is something that you do and you you consider in your own work Is that certain populations might have reasons historical reasons to avoid contact with institutions that would otherwise collect important information? And finally if the data we're using comes potentially from the private sector It may well be to that certain parts of the market might be perceived as less profitable less important And therefore not as well subject to to monitoring and so I think in practice even though people who are very excited about big data Tend to think of it as being a way to capture of activity of entire population In practice it tends to be what social scientists call a convenient sample meaning you're actually just collecting information That is easy for you convenient for you to collect and that in general they tend to lack the rigor That would be common in and more traditional social science research where you're purposely setting out to try to collect a representative sample The problem here ultimately is though that even if you're sensitive to these issues as Boston was It can be very difficult to figure out how to compensate for so in the case of Boston They actually did something pretty clever, which is that they realized that their garbage trucks were already Trans were already moving throughout the city And so they just use their own existing garbage trucks to base to basically do this work rather than relying on people But in some cases it might be hard to even figure out what mechanism to rely on to cultivate a more representative sample of the population And here that just requires a lot of kind of careful thinking and potentially some creative solutions Even one you recognize that you might be dealing with a bias sample Okay, let me now take an example from employment So I mentioned earlier that you know the hope is that they did it could be a way to overcome some of the prejudices and biases that persist in highway decisions. So now consider an example of a Company that basically wants to figure out how to automate the process of assessing job applicants And this is increasingly something that a lot of low wage High turnover employers are doing because they're often flooded with applications. And so imagine a situation in particular Where you basically are looking at historical? Documents you're looking at which of my previous employees Performed well, which of my which of the people I have hired in the past went on to actually Demonstrate that they are you know really effective person on the job And so you're using this historical data to try to tease out what seems to be unique about those people who in the past Tended to be the high-performing employees now imagine the situation where Employers in the past had systematically passed over applicants because of their gender or race And now imagine in the future that even when the system that is being developed doesn't rely on Explicit things like gender or race to make decisions, but instead is considering perhaps where someone went to college It could actually discover that in the past no one who had been who had applied from Howard or Wellesley For example a historically black college in a all-women's college had been hired And so they the machine simply looking at the historical pattern of how previous decision makers evaluated applicants Would basically learn to kind of see these signals Howard or Wellesley as a sign that this person probably wouldn't be Good on the job. This is someone that that probably wouldn't be What the company wanted to hire and so here is just a way in which as well leader was saying earlier Big data models can in fact just inherit the bias or prejudice of previous decision makers and this problem becomes oops sorry Becomes tricky and part because the way that we try to evaluate whether or not the decision procedures We've developed by looking at historical data whether those decisions are actually accurate is to actually just use a subset of the data That we know from the past and see Okay, we're gonna run the data where we know whether or not someone's been well-performing employee in the past through the model And we're gonna see what the model predicts and if it predicts that they're like a high performing employee Then we say that the models accurate the problem here is though that the the data We're using to evaluate the model is equally tainted It's been equally biased and so there's actually no obvious way to rely on this evaluation method to decide whether or not It's actually biased because the way we even evaluate the model is using the same bias data But let me make it even more complicated. Let's imagine situation where Yes, people had been hired from Howard or Wellesley in the past and those people go on to have you know somewhat of a lengthy career But the evidence in the and their employment records suggest that perhaps they weren't the highest performing employees that you know They were sort of either middle of a pact or maybe they weren't you know particularly good at what they were assigned to do What this fails to recognize is that oftentimes the data we record about people's performance can itself be a reflection of The conditions under which they work So if the workplace had been for instance hostile to women Or there were certain kind of institutional dynamics that made people from a certain racial group feel unwelcome Or certain more desirable assignments and tasks were assigned to them It might then suggest that these people in fact weren't the best employees But in fact really is a reflection of the kind of institutionalized and subtle discrimination that persisted in the institution itself and so to kind of use this data as a Kind of objective assessment of who is going to do well in the future We're just in fact bring those dynamics really early into the process of even deciding who to hire in the first place And so this I think it's a really pernicious problem. We have to be sensitive to What the data is actually capturing and oftentimes it's not just capturing people's innate capacities But instead it's a reflection at least in part about how they are treated and the conditions under which they were And so here the question is you know, how would you figure it out how someone would have performed under different Non-discriminatory circumstances and this can often be an impossible question to answer in a statistically rigorous way But it's it's an exercise that you should at least try to walk through in your own mind To consider what you're doing when you rely on historical data to automate future assessments Yeah, okay, and then the next point I wanted to raise is Oftentimes we're not just using examples of individual people We're deciding also at the same time on what features which variables Attached to each record. We're going to be considering so for instance when I'm developing a model to decide who to hire in the future Not only am I going to look at past employees, but I'm going to also make a decision about you know Should I look at their annual review rating? Should I be looking at their email records? What variables am I going to fold into my analysis? And this is often a complicated decision But the way I like to focus. Oh, it looks like Allison may actually have a comment here In the go-to webinar, there's an option to raise your hand I'm going to try unmuting her and see if she can actually ask it directly. Sure You should be unmuted on our side You can check to see if it if you're muted on your side, but you've tried to open it up there and Does not seem to be working if if you're able to type that question into Questions or whatever that comment is Allison from SRLN, please let us know or if you can unmute on your side. I believe it unmuted on our side So But we cannot hear you if you're talking Okay, I'll continue for now and hopefully I can address the question if you post it to the question box So the point that this line is trying to make is that Sometimes and making choices about which variables to consider we might Produce an avoidable disparate impact and one way to think about this is that Historically redlining specifically mortgage redlining was clearly pretence. It was intentional discrimination Where banks were relying on a known proxy for race in order to deny black people bones And in particular the the way they did this was by saying that people who lived in certain areas of the city potentially a zip code Should just categorically not be entitled to both because they're high-risk This was an intentional practice Trying to mask intentional discrimination, but imagine a situation instead where we have unintentional redlining I Think whoever might be controlling the computer Okay, well, I'll continue to talk while this gets back to the slide. Um, so What was I gonna say? Oh, right? Okay, so Here the the interesting challenge is Imagine if we're a bank and the only information we have is Something like the average repayment rate on a loan by zip code Which is in fact sort of what was being done in these traditional scenarios of redlining But imagine current circumstances where we only have very coarse information We have information that says if you live in the zip code on average, you are likely to repay your loan at this rate Because the information is so coarse it might say that certain neighborhoods are just relatively speaking much less good of a candidate for for a loan But of course was really problematic about this way of doing things is that if you had additional information if you knew much More about the individual residents who resided in those zip codes You would discover of course that there are certain people that in fact would be great candidates for a loan and perhaps others less So but because you're restricted to this extremely coarse information You end up kind of discounting an entire neighborhood out of hand and the question here then is Does the information that we rely on provide enough granularity so that certain parts of the population Aren't sort of discounted out of hand And another way of thinking about this is when we decide whether or not There is a degree of error that we feel is tolerable and making these assessments We often aren't sensitive to how that error rate is distributed across the population Is it for instance just that the error is random in the sense that whoever you might be in a right of your gender Erase you have an equal chance of being subject to a neuronal decision Or is the error actually systematic does it mean that if you live in the zip code You are much much more likely to be subject to a pneumonia assessment and here then I think there's a really important question Even in people even for people who are doing important work of defending the poor Which is you know, how much information how much money should we spend and how much resources should we invest in Collecting data that allow us to make these more granular distinctions Which really mean that we don't subject historically marginalized populations to higher error rates And I'll just say that this way of thinking tends to kind of equate The parity and accuracy across different groups to a notion of fairness So something is fair so long as we ease no matter who we are having an equal chance of maybe getting Subject to a ronies decision, but in a way. I also just want to quickly point out that in many circumstances We might think that there are cases where having less information and not being able to achieve certain rates of accuracy Can have important benefits for historically disadvantaged communities and the obvious example of this would be something like when I don't know For instance like how valuable a customer is I have to actually provide a common level of service So perhaps some people saw this story from a few years from a few weeks ago About Amazon Prime not being offered across entire areas of major metropolitan cities And the thinking there was that of course, you know, some areas were not enough customers They don't do enough business and so they're only going to offer Amazon Prime in those areas where there's a Sufficient number of people to do that Well, if they couldn't actually carve up the market that Granually they would have to offer a more common level of service across the population And so the idea here is just to simply point out that like sometimes improved accuracy can harm the poor and historically marginalized because it allows Profit seeking companies for instance or even governments to focus resources in ways that are just Reinforcing inequality Okay, and the final thing I'll say before I conclude is that there's going to be these final cases too Which are extremely difficult so of all there's a company that helps perform the kind of task I was describing earlier Which is a valuing job applicants the slide that's presented here is marketing material Which focuses on how they are helping employers try to figure out who who is current which of its current employees are likely to leave Likely to turn over or engage nutrition in the near, you know, the next few months And the idea here is that in this in this material anyway that the company could take some kind of steps to intervene to Retain people so that they don't actually lose that particular employee in practice. However companies like evolve and others Are our use very often in an assessing job applicants So not people who are currently employed, but actually people who are apply applying for employment And we know of an important anecdote about involve because evolve was hired by Xerox to help it evaluate applicants for its call centers Call centers are very low-paying jobs with high turnover rates Training is a significant cost. And so Xerox and others are trying to minimize the turnover rate They want to get people who are going to be on the job for more than a few weeks three months and here What evolve was able to do is figure out what was the variable that best predicts whether someone will stay for You know, let's say a six month period and it turns out that the distance the person happens to live From the place of employment is the best predictor of how long they're likely to stay Which may not be terribly surprising the idea here is that there's a fair amount of community involved But the reason I point this out is because of all of itself Recognized that distance from work was often going to be highly correlated with race Given racial geographic segregation in the United States and so evolve counseled Xerox not to use this variable even though statistically the most predictive feature And what this I think reveals is that there are these lasting historical effects of racial discrimination in particular That mean that even when companies are simply pursuing something that seems like a rational business objective They can end up engaging in activities which have a manifestus for impact And here ultimately, I'll just end by saying that this is going to be a really tricky problem because in many cases Even considering those features those variables which seem you know statistically relevant and on their face perfectly reasonable to think about and making these assessments increasingly they're going to be Correlated with things that we should not be considering so this is what computer scientists have described as redundant Includes what this means is that my gender or race or religion or disability status Those facts about me will often be highly reflected across multiple other dimensions other data that you have about me And so even when you're considering these seemingly benign facts even relevant facts You're going to be considering things potentially that are highly correlated And this then just presents a very difficult problem about making hard choices about whether or not to proceed So in the case of Xerox it decided no it would not use this fact even though it would reduce the accuracy of its predictions and Likewise other companies actually came to a different conclusion. They felt comfortable using it despite the fact That they knew that it potentially had this disparate impact So I'll just conclude there. Oh, and the final thing I'll say too is that I don't want to suggest that that Using data to make important decisions is necessarily a bad idea In fact often using these systems to make hard choices kind of brings and makes brings into greater clarity a lot of the important and difficult political choices we make And this can be an opportunity rather than kind of obscuring the stakes of these important decisions Having to formalize them some ways can allow you to have a more effective debate about the values that they potentially endanger So I'll just stop there. Oh And for the oh, sorry the final point very very very My thing I'll say is that if anyone is particularly interested in my presenter here and would like to have some more legal background I have a paper that's on this topic with a co-author as a legal scholar that goes into much greater depth So thanks very much An interesting thing to take away is that as he highlighted some of these companies that are emerging Starting to use big data to decide who's a good worker who is a good person that I can rent my apartment to It's important for advocates to know that There's people in the private sector Increasingly also using big data and are not so ethically inclined the way that we're outlining it here So a lot of these issues are gonna eventually in five or ten years or sooner gonna become Issues that will be at the forefront and the grassroots working with clients on various worker and labor rights issues So these kind of understanding big data is both gonna be an advocate issue that we work with directly in our clients And it's something for us to keep in mind As we use big data ourselves on the practical side though I will definitely mention that Northwest Justice Project had engaged in a Research project looking at our foreclosure data and the national foreclosure data databases along with a researcher from University of Washington last year and We had to engage in the who's gonna have access to the data What does the data sharing agreement look like between us and the researcher? How much of that data can be shared in a publication later? How do we protect clients privacy? Confidentiality that type of stuff in that data sharing agreement. So This stuff is very practical When you start partnering on projects or using the giant amounts of data that many legal services organizations have To try to find trends in their own community So thank you so long for all that excellent introduction some of this first half year will be pretty redundant to so long's work So I will try not to take too long going through some of the beginning part and then I'll go ahead and explain sort of what CDT has done at the work. So I wanted to quick explain a little bit about Who CDT is? Okay, so city is a nonprofit organization based in Washington DC And there's a little list here of stuff that we're well known for but basically CDT is a 20 year old organization that has been advocating on Technology and internet policy since the beginning of the internet. So it's really an interesting place to work There's a lot of really good work being done there based from, you know things like internet architecture and the structure to the internet all the way through to online privacy including everything in between like free expression online and You know security and surveillance concerns So what the government knows about what's happening on your technology. So it's a great place to work. We have been working with academics and civil rights organizations in Washington DC and around the country on some of the issues that so long raised So CDC worked with Some civil rights organizations and Other advocacy organizations and we were trying to figure out, you know What to do with all the information and the insight that so long just presented and essentially one thing that We wanted to focus on and what CBT is sort of best at and and well known for is helping companies Make good decisions on the design end, right? And so rather than sort of letting things go too far and then having to walk them back CBT is pretty well known for Helping sort of forced all some of the worst-case scenarios. So we took the work that so long had done along with a lot of other academics And we sort of set about creating a project to help Help companies who are interested in making good decisions Figure out, you know, what they need to do to their own technology And so the advice that we have here in the insight that we have was sort of specifically focused on the private sector But I think a lot of it could be applied more broadly To anybody who is creating an automated system that is and is concerned about some of the things that so long has Pointed out and and some of the issues that we've seen raised. So that was sort of the nature of the project that we're doing And so again, this is a bit redundant to so long's presentation But just wanted to make the point that automated decision-making systems and algorithmic sort of processing is Really present everywhere. It can have a pretty big impact on individuals But we get asked a lot, you know, what it's different about this kind of discrimination And what's different about these concerns compared to other situations and and I think that it's important to remember the Speed and the and the sort of spread of the technology One of the problems that someone mentioned exists in the system, it will be applied across the entire Network really quickly and it will affect a lot of people. So it's it's really bigger than just sort of basic concerns And actually a really interesting Observation that I think is worth noting and it sort of relates to so long's point about how error is distributed is that, you know For one single individual it's sort of variable whether or not an individual decision affects them But if you're looking at society at large You can sort of see the huge concerns with discrimination and bias if they're perpetuated across everybody all at once All right So the civil rights arrows principles for the era big data was a document that was written by a group of civil rights Organizations in Washington, DC. You can see the signatories here in the bottom This group created, you know, just some basic ideas of what they thought needed to be applied in these technologies as far as Promoting civil rights and making sure that the laws that so long pointed out were not undermined in this context You can see the list here and this is a really good list and it was really, you know Well thought through and there's a lot of really good organizations that signed the document But on the other hand if you look at these just instructions and you imagine handing them to somebody who is creating this technology These aren't really something that are better. You could sit down and just say, okay, we're going to do these things They're not directs. They're not really straightforward. These are really principles, right? These are really high-level observations And so we sort of took these as the jumping off point CDT did and said, you know, how do we actually apply this? How do we how do we help people use these principles but integrate them into technology that they're using? And hi, Ellie. This is Wilnita. I'm sorry that keeps popping up But can you can you remind us again? What year that was? The civil rights principles for the era of big data came out in 2014. I'm pretty sure into that. Okay Yeah, so it's been out for a few years. Yeah. Good. Thank you. They're cited in if you if you review the the White House big data report there in You can you can see those in there. They were it was pretty well received. So it's like makes a pretty high-level impact already So when we were sitting we sat down and sort of said how do you apply these things to technology and a few things that we Need to to observe is that the stakes are high You know as we pointed out for individuals and also for society So it's not just that these are things that are traditionally consumer protection, right? But it's it's really much bigger than that On the other hand if you're thinking about the huge number of ways in which automated technology is used It's really hard to come up with some sort of Concrete hard and fast rules that you can apply across the board There are some examples as you know, I think so long presentation presented some good scenarios here where there are some examples where it actually Can be beneficial to sort of have some these insights and some of this data as far as preventing discrimination And there are some where obviously it's it's bad And so it just kind of depends on what kind of context you're using the automated decision-making whether or not it is more constructive or deconstructive to have some of these pieces of data floating around so it's hard to create one sort of Concrete concept But it was kind of a little bit warped But so what we did was we first sat down and said okay How do how do people create automated technology? All right What are the processes that people use to sort of create these systems and and let's just sit down to figure out You know what this looks like and so we created this sort of math and it's not Universal but we divided the the process into four kind of five phases really So you can see here that you would design a system you build the you build the model You test it you refine it so there's input there's implement over here as well So you implement it you test and refine it and then you evaluate and execute your decision And what we did was we sat down and said what goes into each of these steps So here you have you know coming up with just an idea of something to automate. Why are you doing this? What's what's your goal? What's your mission? You know what's what's the actual process they were trying to To limit and so or to create and so there's this first step is really really important And actually I think this first hold this whole row here is it's pretty critical because here's where you sort of ask Yourself some of the really key scope a key Scoping questions So here you have to figure out what are the parameters of the problem? What are we trying to figure out and what kind of results do we expect you establish your variables? You can strain your constrain them and you figure out you know What variables do we think relate to what outcomes and then here is a lot where a lot of So long insides on data the actual data that goes into it would come into play Where is your data coming from what is what is contained within it? And how do you know if it's good or bad and then the next part is sort of related Which is how are you going to analyze this and and are the tools you're going to use do they have any sort of Background in them right and so the data question. I think is probably the easiest to grab on to which is you know Where where is your data coming from and what were the biases of the people who collected this data in the first place And bias doesn't necessarily in this context have to be something that's active is so important out in some ways Passive right can be an absence of information It could be that people are not represented it could be that people are represented in ways that are not fair from the beginning and so This top row really contains a huge amount of steps and a huge amount of process or a potential for the process to become sort of flawed by design So this top row is really critical and then from there on you go forward to you say okay What kind of technology are we going to use are we going to use? Machine learning are we going to let this thing let the technology sort of proceed unsupervised? Are we going to stay involved and you have to sort of design a feedback mechanism so you can evaluate your own? You know technology and your own and your own use of it And then you might implement your model and then you have to go through the process of figuring out is it working Is it working testing it? Maybe you add more data in that case you want to go back and make sure that you know You're running through this part of the process again And then over here ultimately you'll end up with some results and then ultimately you execute decision And so we created this just for our for us and initially I just created it for my own sort of information But trying to figure out you know how do people use this technology? How is it running and then we sort of thought okay, so now we have a sense of how people create automated technology You know, maybe not all these steps happen in every case But in general this is sort of the process that is followed how can we help people who are using who are think this Way who think in these sort of contacts. How can we help them integrate? Some steps that will they'll prevent some of the harms that so long pointed out so we split it up into these sort of general Categories and we thought you know one of the most important things that people can do is just ask themselves some basic questions Right don't let them don't let themselves or their process or their employees or their company proceed sort of unchecked without creating any sort of background for themselves to to Sort of investigate their own assumptions So here are a few questions that we think are relevant in the design stage as I mentioned You know what's the source of your data was it collected by people right or was it automated in some way to begin with? And what was the incentive structure of the people who collected it? This comes up a lot in the context that I think you guys are working in You know if you're thinking about police data and criminal justice data a lot of that is collected by people like by officers What are their incentive structures? Do they have quotas? They have to meet do they have? Neighborhoods they more frequently visit like what are the what are the qualities that affect that data from the beginning? So it's it's not that the data is you know without bias from the get-go One important consideration may be whether or not the information was also was Initially handwritten or was it something that was always machine readable? Only because there may there may be an error an actual error made in the translation between handwriting to machine readable format And again, I think that's likely to be a concern in the in the criminal justice context and format And then there's some next questions here for the rest of the design section, you know, how could you overcome this? How could you how could you clean the data? Is it represented or there are other other populations who aren't represented? Is there anything that you think should just be explicitly prohibited from your process? From the beginning and so these are a few questions We thought would be relevant to ask in that initial design phase. I mean just going back briefly to the To this and so now we've sort of these are questions that you ask yourself in these in these steps And what we're going to do is go through similar questions that we have we thought people stress themselves kind of going all the way around this this wheel so So now if you're in the phase of building your model, what are some questions you need to ask yourself to make sure that you aren't Making some of the mistakes that that have raised concerns in the past and here and here you have some stuff That's a little bit harder to pin down Generalizations and cultural assumptions is something that is not obviously necessarily a concrete assignment And it's interesting how often we get pushback from people particularly in the credit context where they say You know for example, like women are women are a better credit risk So if you're saying we can't include gender you're actually disadvantaged You're creating a disadvantage for women and doing so for that's something that we hear a lot There's there's entire papers written on this question We sort of push back and say we disagree We think that it's more likely that there are other sort of cultural situations or cultural Biases that that tend toward the result that you're seeing which is that women are higher or lower risk for you But it may be the case that that's because women have historically had to meet higher standards in order to get credit to begin with Or maybe the case that society sort of discourages women from taking risks from the beginning and so there The bias is actually not about gender, right? It's about some other cultural relationship and to ascribe it to gender sort of perpetuates it in a way That's concerning but also doesn't really get at the actual problem of it or the actual reason And so in some cases is just bad science This is a very hard thing to get people to think about but I think it's it's important I think one thing that you could ask yourself if you're trying to figure out You know am I making this assumption or is it something that's that's got some kind of like concrete science behind it Is if you'd be okay if people saw the reasons right if you said like you know women are a lower credit risk It's amazing. Why do you think that's true, right? You do if you would feel comfortable explaining that then maybe you're in somewhat of a better position Then if you feel uncomfortable explaining it There's some technical tools here as well that we think people should ask themselves about are there ways that you can sort of removes People from from a category of suspicion or like a targeted category if they don't need to be involved in that directly Here's a comment that would relate to so-on's point, you know are there proxies around And then how much of the statistical process is required You know, how can you prevent those proxies sort of from sort of becoming part of it? You know this and this last question I think is really Hard for policy folks to understand but maybe a little bit more easily understood by talking with people which is you know It basically just says if is it okay if at the end of the process you can't really explain why the answer is what it is You know is that something that's acceptable in your context or not? That's a question that only can answer So here are a few questions we think you should ask if you're once you're testing, you know And again getting back to so-on's point about error rates here. What is the acceptable error rate? You know, how right do you need to be before you're okay putting this thing out on the market and releasing it against people? Making decisions about people's lives and is the error rate evenly distributed? That's maybe something that's hard to test but I think is a really important check Um Here's here's a question about you know what what factors are predominant in determining the outcomes? Gives you an opportunity to reflect on whether or not that's Those factors are fair if that's something that you you feel is right and think most of these in general are just about trying to figure out The connections between the results that you've had and then maybe if there's any concernings or reasons for those results one important process is to create a feedback mechanism and and Probably a way for individuals to report if they feel like there's been an error and a mistake So that you're not entirely relying on your own back end to determine the results And then once you've implemented it just say back in thinking, you know What happens if there's a false positive or a false negative of this of this equation? How do we deal with that? What are what does that state for people again? Just trying to connect it back to individuals themselves? There's good another way another one where you have a chance to report whether or not People can can let you know if they feel they've been treated unfairly You can sort of map those out and see if there's an underlying problem in those you know That's right in the commonalities among people who feel like they've been treated unfairly And then importantly we think you should have a human being sitting somewhere in this process Is there a method for people to look through this and if so where in the process do you put that person? And what are their responsibilities? What are their obligations? You know just this person make the final termination or just the machine make the final termination like How do you figure out? Who holds responsibility ultimately and then as you're going through and changing it all those all those In all that all those information that you get from those processes need to be reintegrated into the system Right and then when you're doing that you have to sort of consider that you have to go through back to the original Design phase and make sure that you're not introducing new data that is biased after you've you know done your initial checks You have to go through and and make sure that you're you're Integrating new technology or integrating new information and according with the ethics of the process that you want to create And so we sat down and we said here's a bunch of questions We think people should ask themselves And what we did is we took the original wheel that we showed you guys and we had put the questions in general Although a sort of pare down version just to make it a little bit easier On the outside of the ring here's sort of the design of you know where when you're asking when you're the phase of Quetting buying generating or figuring out your data. What do you need to ask yourself when you're in this phase? And so here's a bunch of questions sort of trim versions of the questions we just went through and ultimately the the goal This is still a document that's in in progress because we're working with a few Technology companies and other folks who use technology Directly because CDT is an advocacy organization So we're working with folks who sort of are in the field using this technology to make sure that this is productive But the goal is to create a sort of one-page document that a policy team or a designer could have at their desk Just that's sort of a mindfulness check, you know What kind of questions should I be asking myself as I go through this? How do I figure out, you know, what the what the civil rights community would like me to know? and Yeah, so we created this document and we hope that people will use it and we are still working on it So if you guys have any Feedback for what you think needs to go in this outer ring The way you think you might find useful earth, you know, what's confusing about the process We'd be very excited to hear that so that's sort of when the project is that right now Allie, thank you so much. Do you mind if I circulate this through the listserv? The this handout and then get people's reaction as well that way Yeah, yeah, I wanted to make sure that so I'll circulate because we're past two o'clock Already, but I wanted thank you so much Are there any questions because I before they get off I'd want to ask Salon a few things so Brian Let me know if there's you see anything on your end. Nope. No questions Okay, so for me Just so we can get it on video at least some people can reference later is For you Allie do is this second on do you envision people this process being ongoing or is it really at the beginning of a project and You know like maybe quarterly that you would kind of look at this this kind of use this this guidance like how do you how do you Think this should be used Mostly as a beginning or is like an ongoing kind of checking thing With a product with a specific like automation project It certainly seems true that the bulk of the concerns as far as You know creating bias and embedding bias happen and in the initial phases right like you can sort of see in this initial row There's a lot of things that happened here where the the risks are pretty pretty obvious and pretty substantial So I certainly think that this should play an important role in designing your process But once it's once it's created you can't just sort of let it run Because it's not entirely clear at 20 at least we'll kind of processing technology or using it's not clear necessarily That it will sort of stay in the zone that you've tried to create and so it's both right and I think that one of the things That we're trying to figure out with people is when you have a bunch of different automated systems that work in tandem But are not necessarily tied to the same data or not necessarily Processing on the same System how do you help sort that out like how do you get systems to interact with each other? Well sort of keeping some of these things in mind And so if there's a lot of bigger questions that need to be answered But I think in general the answer to your question is is that it should be you know the Testing and refining and evaluating phases all should include this process Throughout the life cycle of an algorithm, but yeah I think the bulk of the concern might happen in that initial design phase and And beyond that it seems clear that I'd whenever you're using a service that is collecting data in some way a lot of those Services will update their terms they will update the amount of information that they're collecting on you or other people about it So I would definitely have some type of a quarterly or twice yearly check on those systems to see where they fall into Your ethics and the standards or defaults or features that you really want to enable or use Along with the data that's being collected because they're gonna change progressively Yeah, this is um, it's a it's amazing. I haven't seen something like this that breaks it down to stages It's amazingly useful. So I'd like I want to circulate it extensively to the community And then my final question just to either of you is that is the FTC? still the main kind of federal institutions for advocates to keep an eye on and in how You know how big data practice are being regulated or there's some other institutions that are starting to Make you know trying to start to guide big data practices that we should keep an eye out for Do you want to start alley maybe and then I can come in too? Yeah, definitely So I think the CFPB is one agency you want to look at especially with respect to credit and lending I think they're gonna be more involved in sort of What kind of data is acceptable how decisions have to be made and I think sort of maybe re-evaluating some of the transparency requirements around Date decisions that are made outside necessarily of the context of the FCRA, which is through the major Guideline, I think also the Department of Justice has taken a keen interest in local law enforcement use of data Analytics and they're trying to sort of figure out what guidance they want to give and what requirements they want to have But it will depend I think a little bit on as you guys pointed out in your in your advanced materials So a lot of this is being done through third-party vendors So I think it will depend a little bit of on you know how much power they have to control those What goes into those systems which are now back out of the government sort of open data section and into the proprietary technology World so I'm not sure what it's how exactly that will work out. Okay. Thank you And I'll just add very quickly to that you know to the extent now that data is playing an Important role in these kind of consequential decisions across various domains of life The rather than regulators definitely seem Concerned so for instance the equal employment opportunity Commission EOC the employment regulator There's definitely thinking about this With respect to hiring and employment decisions The Treasury is similarly interested like CFPB and some of these issues specifically related to banking and consumer finance And so I think there's a substantial amount of interest in the partner of federal regulators I also think there is increasing interest on the level of state and some city regulators too Which is maybe less well known and less well explored So I think it's actually probably worth investigating whether your state or maybe your local municipality Is also engaged in some of these conversations and the very final thing I'll say is that the White House itself Continues to kind of lead some of this discussion. It will be stay Document I guess two weeks ago now that is this kind of summary statement of where the thinking and research is on the question of Bias and data driven decision-making and I encourage people to look at that as well So when Trump becomes president then we can hope that this will continue, right? It feels like a document that's designed to ensure that no matter who who continues Two quick comments. First of which this is from one of the people in the audience I think it's important to have a social science researcher on console They're trained to think about these representations of data admitted variables bias that type of stuff It is often possible to get someone like that through a local information school or Some of the master's programs that type of stuff I strongly agree With that comment there. I would also add that state Legislatures are becoming more and more interested in this particular area and we're seeing some state-by-state passage of laws giving individuals information about their data or Required reporting if there's data leaks or other things like that So I would also try to stay up to date with what your state is doing