 All right, I know some of you are still getting your food in the back of the room, but maybe we'll start things going. That was really fast, I appreciate that, thanks. Thanks everybody for being here, my name's Chris Bavitz, I'm a clinical professor here at the Law School, and along with a couple of our panelists, some faculty co-director at the Berkman Klein Center for Internet and Society, and we're really thrilled to welcome you to today's event. This is, in some ways, one of our usual Tuesday luncheon events we host at the Berkman Klein Center. For those of you who do not come to those regularly, please go to our website, sign up. They are not usually quite this big, but we do feed you and offer you scintillating conversation and discussion almost every Tuesday, so sign up for our mailing lists. If you're not able to come in person, the events are typically live streamed, as is this one, and will be archived online later, so you can tune in. I mentioned that just in case you ask a question later as we hope you will, just be mindful that you're live online. We are here today to talk about a broad range of topics that I think generally fall under the umbrella of algorithmic fairness, algorithmic justice. At the Berkman Klein Center, we have an initiative underway around ethics and governance of artificial intelligence, and a significant swath of that work touches on some of the issues we're gonna be talking about today, very specifically in the criminal justice system, and also more broadly as we deploy and develop and as particularly government actors procure algorithmic technologies to do things like performing risk assessments on criminal defendants and the like. What factors do they need to be considering? How do we do our best to ensure that those systems are not biased or as unbiased as we can make them? And we have an incredible group of panelists here today to talk to us about these topics. Chris Griffin and Jonathan Zitrain here from the Law School, Margo Selzer and Cynthia Dwork from School of Engineering and Applied Sciences. And rather than my sitting here and reading their bios to you, I'm gonna maybe go down the line and ask each of them to introduce themselves, talk a little bit about the work that they do relevant to the subject matter of what we'll be talking about today, and just offer a few general words on their perspective on sort of algorithmic fairness and justice, and from there, we'll turn this into a discussion again. We'll be sure and leave hopefully about 15 minutes at the end for questions from you. And Chris, since you are seated immediately to my right, I'm gonna ask if we can start with you. Sure. Good afternoon, everyone. My name is Chris Griffin, and I am the research director of the Access to Justice Lab here at Harvard Law School. The A to J Lab was founded in the summer of last year, and one of our founding projects is in fact on pretrial risk assessment. Everything that we do at the A to J Lab is pursuant to a randomized control trial. So I hope that as the conversation progresses, we'll talk specifically about why we chose that methodology for the lab, why and how we're applying it to the question of pretrial risk assessments. And just to let you know a little bit about myself, my background is in empirical law and economics, and I came to the lab having analyzed data after the fact, not looking at how we can evaluate prospectively how new interventions in the legal system are affecting both adjudicatory and outcomes and outcomes that arise after the courthouse steps. And one of our, again, signature evaluations is of pretrial risk assessment, and it is an evaluation of an algorithm. And I am happy to say that we're doing that. Unfortunately for this conversation, I don't have any results to share with you because our first- These things take a lot of time. They take a long time to do, and our first evaluation launched only in April of this year. But I can say at the outset from where I sit right now that a big part of the conversation regarding the use of algorithms, and especially the criminal justice system, is what is our counterfactual thinking? If you have deep concerns about the use of these tools, hopefully what we'll talk about this afternoon is what the alternative would be. And we know that a big part of that is unguided human decision-making, which is not the worst thing in the world, but it certainly is not the best. And so our job, as we see it at the A to J lab, is to really understand how much more predictive power these algorithmic tools are providing to our judicial officials. Excellent, Chris. Thank you, Margo. Hi, I'm Margo Seltzer. I'm a professor of computer science in Harvard's John A. Paulson School of Engineering and Applied Sciences. I'm also the faculty director of the Center for Research and Computation and Society. We call ourselves Circus and we have a grand old time. And I also feel like I'm a bit of an imposter on this panel. So my research is primarily in the area of systems, not AI, not fairness, not law. But I have, in fact, for the last year, year and a half, been working on a form of model building called transparent models. And I've become a huge fan of the concept of transparency in a model, which is that instead of entrusting any kind of decision-making or advice-giving to a black box, I would prefer that I could give you, in the words of my colleague Cynthia Rudin, an index card that you could understand exactly what the model is saying and what it's doing. And then maybe we could have a discussion about how one would apply what we've learned from that model. So I'm going to be arguing about transparency more than anything else. Terrific. Cynthia. So I'm also in computer science. And I'm new to Harvard. I only started here in January. I've been coming at fairness from a theoretical perspective. So I'm a theoretical computer scientist and I've been trying to look at how would we actually define fairness? And once we had a good definition, how would we be able to determine, how would we be able to generate decision-making algorithms that are here to our definition of fairness? And how can we test other algorithms to see whether or not they're fair? Interestingly, I'm much less convinced of the transparency question. And I'm trying to put some thoughts together for presentation tomorrow that will address that. But we'll fight. Do we get a preview today? And we're both black belts, just so you know. Excuse me, I'm a third degree black belt. I'm only a first. Jonathan. So degrees of black beltedness go like burns rather than like DEF CON? Yes. OK, good. I'm Jonathan Citrin. I am a professor of both of law school and in the computer science department here. And one of the co-founders of the Berkman-Klein Center for Internet and Society. And represent. And I already hear some great themes. I'm looking forward to plumbing, including transparency. And I would break that into transparency of the overall technical system. You might be renting, deploying, contracting out for. And especially when we think of the administration of criminal justice, I think it's a great question to ask. What parts should a municipality or other public authority roll its own versus what should be contracted out to a private firm that then might be proprietary about what it does? I'd love to see us avoid the mistakes that have been made with voting machine technology over the past 20 years on that front. And then there's also the question of transparency. If you do get to see the algorithm, how much do you feel like you have any insight into the kinds of decisions it's making? It may be that transparency doesn't solve a whole lot. And then there's also the question of predictive power. Something might actually have terrific predictive power. And then also, by some accounts, be incredibly biased. If we take as a hypothesis that the police may be more likely to arrest one type of person or one ethnicity over another, then the baseline recidivism rate on which you would say, is this disproportionate or is it fair, is itself influenced by the behavior of law enforcement before something even enters the courthouse. And thinking about how you would try to adjust for that, if at all possible, in an algorithm, I think is a deeply profound and difficult question. Finally, I'm interested in the use of AI, not just to judge an individual who's standing before the criminal justice system and has to make decisions about that person, but rather holistically, to what extent could AI shed light on the root causes that might say, this is why you have so many cases coming through your courthouse. Here are the inputs, maybe an intervention here for thinking in SimCity terms, time to build a stadium or a park or something. Maybe those are the sorts of things that AI could shed light on and then suddenly what had appeared threatening might actually be quite interesting. That's great. So many places we could start there. I'm inclined since it came up in, I think all four of your introductions in one way or another to start with this transparency question. And I can say one of the most interesting things to me about being involved in this broad initiative we have around algorithms and justice and the interplay between artificial intelligence and these kind of powerfully predictive tools that Jonathan mentioned has been getting computer scientists and lawyers in the same room and to talk about legal concepts and CS concepts that seem to map neatly onto one another, but often don't. Transparency is one of those where we have a notion in law of due process where at the end of the day if the state is gonna make some decision about me, we want that to be done via some process that is both fair, understandable, open to the public in some ways. And in some ways that fits neatly with I think the ideas that Margot and Cynthia were talking about when they mentioned transparency in the context of computer science. Maybe I'll start with you, Margot. Can you talk a little bit more about what you mean when you say you've become a strong proponent of transparent systems design? Sure, so what I mean by transparent system design is that if a system makes a decision about you or comes up with a prediction about you that you can say, hmm, and why did you make that prediction? And I can give you an answer in some way that is actually comprehensible to you. And I draw the line at comprehensible, which is that a reasonable person could look at the model and have some sense of why it predicted the way it did for them, as opposed to a set of numbers that seem to map to certain features that don't map to anything recognizable about you. So that's what I mean by transparent. So if I can give you a set of statements that are of the form, you know, if you are speaking on a panel and you're wearing purple, then we predict you are in favor of transparency. You know, else, if you are a professor. It's been true every time I've seen it, so far. Yeah. Not wearing purple, then we predict false. You are not a proponent of transparency. But the model is that easily interpretable is my goal. And is that hard to do, again, from a technical perspective, from a legal perspective, that seems very much in line with what I think lawyers would expect from interactions with the justice system, that if I walk into a court and they say, your risk score is X and Jonathan walks in and his risk score is X plus three, that we should be able to understand the distinction between those two. But yet in conversations that we've been having again between lawyers and computer scientists here, it's been suggested that in some ways, that's not always as easy as we would want it to be. So I guess there's two answers. So one is, I would like to be able to say, why is my risk score lower than Jonathan's or higher than Jonathan's? So that's the question I want to be able to answer. And then there's a second piece I wanted to say, which is, oh, how hard is that? Technically, it seems to be, and I won't say that we know this with certainty, but it seems to be that in some cases, you can achieve higher accuracy for techniques that do not produce these models. But I don't think we know for sure, unless it will correct me, that it is always the case that you can do better with a model that is not explainable. Is that fair? Yes, it's fair. So the class of models that include both the explainable ones and the, let's say, uninterpretable ones is strictly larger than the class of models that are interpretable and therefore it's certainly feasible that there are things that have better accuracy in the uninterpretable part. I should say people like Cynthia Rudin have been rediscovering or re-implementing techniques from the 80s that had been left fallow that have as their highlight that they are explainable. And as you say, it doesn't mean they might not be better, but the explainability might be an important part of the legitimacy of a process. But I'm also thinking in theory of David Weinberger's recent work where he's been talking about what if there are aspects to reality that are causal, there is a path but that happen to depend on such a baroque set of variables, large, complicated that they just don't boil down to a formula like F equals MA. Now that may or may not be the case. It may turn out that there are often lion's share kind of power law explanations, 80% of the phenomenon is accounted for by 20% of the categories of data or something, but it doesn't have to be that way. It could be that there's a category of knowledge that we haven't accessed yet because we've only been applying the poor human mind to problems and only the human mind can only think in terms of F equals MA or equals MZ squared or something that the equation itself is just so long you get bored by the time you're halfway through reading it and at that point it is explained, it is transparent but it's not really explainable and I'm not sure what would happen as we get more and more AIs capable of generating predictions without theory. Right, Cynthia? So Jonathan, I think Jonathan's completely correct but I would like to just go back even to the question of these simple rules lists that Margot is talking about. So the way the rules lists work is that they're of the following form typically. You have antecedents and you have probabilities. So let's say you're trying to classify mushrooms as to whether they're likely to be poisonous or not and you have certain features that you look at maybe whether the stock is smooth or rough maybe whether the mushroom has a foul odor or no odor or smells like anise or maybe different kinds of shapes of the head is it a flat headed, is it a very conical head? So you have methods for generating rules lists that are fairly simple that will say if the stock looks this way and the head looks that way and the smell is such and such then the probability that it is poisonous is P1. And then there's another rule which is really an else if so you have antecedent one implies probability P1 of being poisonous. Else antecedent two has probability P2 of being poisonous. Else antecedent three has probability P3. So when you look at this rules list you may think that the explanation of the probability that comes out is that you have satisfied a certain antecedent but that's not the whole story. If you get to antecedent three then in fact you have failed to satisfy antecedent one. You have failed to satisfy antecedent two and you're actually now at antecedent three. So it's not A3 that is, it's not just that A3 is true it's not A1 and not A2 and A3. And my suspicion and where I have an undergraduate who's building an experiment now to test exactly this sort of thing is that my suspicion is that people simply don't understand that they're making a decision based on this more complicated formula not A1 and not A2 and A3. It's not just that A3 is true. And so the whole question of why was something classified this way even in this incredibly simple scenario is much more complex than it initially meets the eye. So can I respond to that? Oh please, yeah, I completely agree. What Cynthia says is absolutely right and here's how I think about it. Let's imagine instead of deciding if mushrooms are poisonous we're deciding why you got turned down for a loan, okay? And so the examiner can point and say well you fell into this clause and let's say it's the third one that said no we don't give you a loan. A question on your mind might be well what could I do such that next time that won't happen? And so even if I can't compute all the compound probabilities that are actually resulting in rule three I could in fact look and say oh well look this rule up here would have said give me the loan but I don't fit into it because I didn't finish school or my income isn't this. So I think that even though it's way harder than just looking at the clause you did fall into you have a better chance of at least responding to or understanding something about how this decision was made even though it is as Cynthia says way more complicated. Now I couldn't agree more fully but the question of how do I find, how do I, you're asking what is a cheap and reasonable path that I could take to come to a different outcome? That's a very different mathematical question from the question of why was I classified a certain way? And it's conceivable that you can build algorithms that are not interpretable which still enable the how do I get to a different outcome at a reasonable price question to be answered. Chris you want to put your hand. My thoughts on this are first that at least in the criminal justice realm your classification under an algorithm is often a combination of static factors over which you did have control but you no longer do and some dynamic ones namely interviews especially in the pretrial release context. So when we think about transparency and classification a lot of it can't at least in the criminal context can't be boiled down to what could I have done differently at the time of scoring except for perhaps differential answers to the interview. And so I want to maybe offer a hierarchical or multi-level approach to thinking about transparency and this is again informed by the work that we're doing with the public safety assessment. The first thing which I really doubt anybody in the public is going to get their heads around in terms of understanding how this works is the true underlying research that produced an algorithm. Be it a standard econometric model be it certainly a machine learning process but we could expect transparency there. Again, I don't know how much that's going to advance the objectives of the public understanding what's happening. So then we move to how is that research translated into scoring mechanisms. And there we could focus on the inputs not so much on how those inputs are translated into outputs but what inputs are we generally using to come up with the risk scores but I'll argue even more importantly because if you have an ordinal risk scale from one to six for example does the decision maker actually know what the difference is between a two and a three or a three and a five perhaps perhaps not what I think matters the most is how those numbers are then translated if at all into actionable recommendations. And there I think is where transparency is most needed because if the decision maker has a set menu of options that are anchored by the risk scores I think the public is most interested in understanding how we translate those scores into the decisions that our public officials are making. Can you say more about that? I don't understand what you're saying well enough. Sure, okay so in the specific context of the PSA you have underlying in that case proprietary research that generated the model to predict risks of failure to appear or committing a new crime. Wait a second so when you say a model here you mean something where I feed in the data from this individual and I get out a number saying. Correct so there was an econometric model that was fit by the researchers and is now built into software used in counties and states around the country. When you feed in that information nine static risk factors it spits out three outputs. A risk of failing to appear on a one to six scale a risk of committing a new crime on a one to six scale and a zero one flag if the person is at elevated risk of new violent criminal activity. It would be one thing to spit out those scores hand those to the judge at initial appearance and say your honor let us know what you think about the tension or release but what they've done with the PSA is convert through a matrix that is chosen by the jurisdiction those risk scores into specific recommendations. And another thing about these algorithms is we wanna ask ourselves do they generate required action by state official or just another piece of information to guide their decision making. And in the case of the PSA it's merely a recommendation. Right done. Well one thing that Chris's remarks illustrate is just the boring aspect of the critical not the boring remarks. No thank you thank you. I just wanted to intercept that chuckle before it completed. The boring aspect of data quality and whether your model as implemented is able to reflect the quality of the data going in. You've got somebody doing an interview of somebody and they ask three questions and they didn't really get an answer so they move on to the next one or they're not certain about their assessment if it's calling for any discretion on the part of the interviewer. Those are things you could represent in a model but generally don't. And what it means is that by the time it has funneled into a single score much less through a matrix a recommendation there's no representation anymore of the quality of the data that went in and the people doing the interviews may have no clue that that answer to that question could be the difference between freedom and confinement for the person they're talking to. So that's something you could systemically try to work on and in fact you could study just how good the corpus of data is in practice not just in theory. There was one other point I was going to make but I may have lost it so I'll just interrupt it. But really quickly to your point Jonathan one of the things that we've learned in our ongoing and prospective evaluations of the PSA is that that data quality issue is one that is highly, highly fraught across the country. Not because. Because just what if it turns out there's crappy data full stop. I believe that the data that are collected in each system are usually collected quite well. The problem is we know, we talked about voting earlier we know that voting systems are adopted county by county usually or are they could be that the supervisors of elections are at that local level. The same is true for the criminal justice system in that court databases, jail databases prosecutors databases are all housed at the county level and none of them ever talks to each other. And so one of the most important issues about making sure that the data inputs are as solid as possible is hopefully educating states that if they're building new databases or reforming their existing ones please create unique identifiers that abstract from personal information but that allow the state and then evaluators like us to be able to track individuals across the system. Without that, algorithms are somewhat useless because you can't track the inputs and the outcomes in a reliable manner. I should say I remembered my second point. It gets back to Cynthia's mushrooms which is mushrooms are a natural phenomena they evolve in a particular way that might mean that there would be something amenable to the kind of simple series of antecedent and consequence statements that you can make about them but imagine if you could just design the mushrooms and design them as deviously as you please at which point you could devise an arbitrarily complex rule set with like the slightest exception for what otherwise looks 100% safe. Oh yeah, in this case it has some really modest quality that makes it poisonous. It would then be fair to ask for systems like these where might the incentives be to game it if you're creating the qualities on which judging will happen. And I almost, I think about the tax code and the ways in which that is purposefully recondite and there isn't secret tax law, it's all there but you can read the tax code all night and still have no clue what it's actually telling you to do and it's in part because there are exceptions being written in meant to be describing generically a situation that only applies to one company. If you transpose that into the realm of justice or something else you could see a lot of problems with those models and it's almost a recommendation to say I'd rather trust machine learning being, getting feedback from the world rather than some simple model that gets deposited written by a human. Cynthia, did you have something you wanted to say in response to that? Just applause. Well it does, I mean your comment earlier Jonathan about the interviewer who may not understand the import of the question that she's asking actually one might view that as a feature, not a bug I guess if you're trying to get human decision making out of these equations but it assumes a lot about the way the overall model was designed at the outset that it was done such that someone who doesn't know the impact of what they're asking actually gets to the right result. You seem skeptical Margo. Well so I want to get to the other half of it. Yeah please. Which is so we're talking about what happens with models and software and there's sort of this implicit assumption here that well clearly I mean the suspicion around machine algorithms stems from some inherent and I believe misplaced belief that humans are actually fair and unbiased. Now we know that that's not true. Like we just know that that's not true. Is there anyone here who wants to argue that fact? Okay maybe judges making decisions are less biased than the typical human being but I'm going to claim they're still biased too. So now the question is what happens when you have a human who is biased and maybe a piece of software and a piece of software which maybe is biased. I'm not even sure how you can make any statements about the two of them but I'm hoping Cynthia will at some point solve that problem. But I guess I would like to live in the fantastical world where the person who's ultimately making the decision so assuming it's a recommendation can in fact take a model with some understanding of that it might be biased in certain ways and maybe there are ways we could describe that and that that combination might actually be able to make a more impartial decision. Because at least if I'm a human and I'm thinking about bias in a model I might actually then mentally be more predisposed to thinking about my own bias as well because we know that one of the things people have a hard time doing is confronting their own biases automatically but that sometimes if you can prompt them to think about it they actually do a little better. And so maybe the model serves as a way to do that. But does that, so one of the arguments in favor of these kinds of risk assessment tools in general is an argument for consistency. Is an argument that different judges in different counties or different judges in different states will reach a more consistent set of results the more we rob them of discretion and reduce this to numbers. And I'm wondering if what you're saying Margo pushes back against that because you're suggesting that we ought to be allowing the judges to use these scores in some way but to have those scores informed by their broader understandings of the criminal justice and fairness due process. Yeah, I'm gonna get that back to the lawyer. I'll say if we lived in a world in which these algorithms, these risk assessments were demanding for the court a number of legal problems with that but also practical and just the ethos of our system would be lost. We rely on the judicial official to use his or her wisdom, experience and discretion to make these decisions. We like to think of these tools as not even necessarily the biasing mechanisms but information enhancing ones those that reduce or sorry increase the signal to noise ratio. So leaving out biases that may correlate with observable characteristics we also know that human beings are imperfect observers of the world around them. And so misclassification could be not connected to a suspect classification like race or sex it could simply be a misunderstanding of the risk that anybody poses with a particular profile. And we are thinking about the PSA and running some simulations right now to suggest what the likely effect of a tool like this could be and a lot turns not only on how poor the observational abilities are of the judges but also what difference that variety in observational skill makes. How different are people because if they're already very different and misperception makes them look a lot more alike then the risk tool arguably separate people more easily whereas if people are already more alike misperception is likely to keep them all bunched together and the risk assessment tool is probably not gonna do a lot of work. I'm skipping over a lot of very specifics I'm talking very vaguely and generally but the point is that we like to think of this because bias is such a negative concept and a lot of people's minds I don't think we were mentioning it in that way but bias could also just be termed as the inability to perceive with sufficient clarity. If the data that on which algorithms and model algorithms are being trained and models are being built is biased because as a number of us have said already the system's imperfect and we have an historical system that has relied on humans imperfect judgments such that the data that Chris mentions about whether you're likely to flee or return for your trial whether you're likely, whether you're dangerous or whether you're likely committed about violent crime if we accept that those data sets are themselves the products of bias decision making is there any hope of training that bias out of these systems as we build them? Garbage in, garbage out? Is there a way to... So generally, generally I would say garbage in, garbage out. Yes. There are some attempts in the literature to try to debias. I haven't found them theoretically compelling and in general I think that they're problematic but I don't know. So maybe it's possible to combine a small amount of unbiased data with a lot of historical data and somehow or other leverage the two of them to debias the historical data or to somehow learn models that are less biased. So that seems a lot less hopeless. I suddenly feel a little bit hopeless on this to one of your earlier questions, Chris. If you maintain a lot of discretion that could have the countervailing effect of reducing the continuity that risk assessment tools are designed to produce. I think that the continuity that we're really looking for is that focal point that would be available in all cases to interact with the judge's discretion. Here's an example of that to speak to the garbage in question. With the PSA, you will have, if you are a judge receiving a report, a list of what the nine risk factors are and how they stack up with the defendant's criminal history. In some jurisdictions they're not only giving you the yes, no or the count of how many failures to appear there were but they will tell you if there was, if there was any conviction, what was the crime for which the individual had been convicted. And if the judge wants to look at that background and say to him or herself, you know, maybe some of these risk factors are so tainted by bias earlier in the system I will discount their applicability to this case and that will allow me to depart from the recommendation. I might, if I believe that the risk factor is extending the prediction of risk too far I will back away from that and recommend a more lenient release condition. And so I think again there, you start with a focal point, there may be some anchoring bias that arises from that but I argue that's probably a good thing again given how variable unguided human decision-making can be but that same unguided human decision-making when combined with the risk factors can I think make the system a lot more just. I wanna pick up on a point that Jonathan made in his opening remarks and maybe tie it back to some of the conversations we've been having about transparency interpretability and that has to do with questions around the interplay between private commercial actors developing these tools and the state and state criminal justice systems that are applying them. Is there something, I guess the first question is is there something inherently concerning about that model the way we have state government, local government procuring software to do any number of things made by private vendors? Is there something particular about these risk scores that should give us a particular pause and then related to that I guess is the question of whether the kinds of interests we're discussing here around transparency, interpretability, explainability are in some way at odds with the private commercial interests of companies that develop these tools that may not wanna reveal their special sauce or whatever it is that makes their tool as good as it purportedly is. Any reactions on the state use of private commercially developed tools? So when faced with this question I like to appeal to the miracles of modern cryptography that allow us to prove things about secrets to prove that secrets have certain properties without actually revealing the secret. So in particular, if the secret is an algorithm and I want to prove that it has a certain property there are techniques from cryptography that would allow me to carry out this proof and convince you that the secret algorithm has this property without actually letting you see the secret algorithm. And I would succeed truly if and only if the secret algorithm really did have this desired property. So at least in theory I'm much less concerned with this particular problem. There's just a whole lot of technology out there that could be leveraged if secrecy of the algorithm were truly the only problem. Go ahead. Well just I, of course, I think we'd all be grateful for the extent to which there are forms of proof that let the company keep the secret while still being able to probe and audit it. But there's the antecedent question of why should the company be keeping the secret? And I don't blame the company for wanting to develop a business model around something, but especially if you think there's gonna be more data accrued that would improve a model, the idea that that keeps through feedback effects getting the company's hands and no one else can really use it. That just locks in whatever you've got and there's no form of competition for a better model. So I would love to hear if anybody would make the case including from the companies where if you're looking at it from a societal point of view why you would possibly want to see this farmed out rather than have it developed in-house and maybe a given county may not have the data science team needed to do it, but counties are recognized into states, states are part of a larger country, like surely if we all chip in we can build an interstate. And that's the sort of thing that I think it's just been not on the radar, it's a new phenomenon, you have an existing system and then there are entrepreneurial companies coming in to improve things. But I have yet to hear an argument as to why we would possibly want this to be with contractors at arm's length, especially if they are going to be able to claim trade secret over what they're doing, proof's not standing. Margaret please. So I just want to agree completely and build on that. So first of all, the secrecy of the model is a totally independent thing from business model. This is something the open source community never quite understood, but open source is a license and business model is not a license, it's a business. So the same thing exists in tools like this, number one. There was a number two. You were talking about, yeah, so I am totally fine with private companies building these models, but I see no reason that they should be even allowed to do it in a closed fashion such that I have no idea what's going on. That just strikes me as worrisome, right? How many of you would take a drug because you plugged it into a black box about which you knew nothing other than we think the vendors trustworthy and you were then so comfortable taking that drug? Wait, that is 95% of drugs, is it not? I like to think of my doctor as a little bit different than a black box. I sometimes, when I'm truly bored, I read the insert with the prescription drug and there's always that little diagram of a molecule and I'm like, that's pretty. And then in the middle, it's like pharmacology. It's like through an unknown process, headache goes away. I'll take it. Sorry. I'm that annoying patient who actually asks my doctor, like, why do you want me to take this? And actually I have one doctor that the day I walked into his office and he was about to sign me a drug, he said, okay, let me show you the following research papers about people who have a condition exactly like yours and what the results are. This is a doctor I can get behind. And this is exactly what I want to see in the algorithms that we're using to decide people's face. That's a great example. That's a statistical argument your doctor was making. Your doctor was not walking you through the pharmacological process. Correct, correct, but he was walking me through the... So it's testing, not transparency. Well, correct, that was not transparency. That was a, because my biochemistry probably wouldn't be up to the transparency. Oh, and I think it truly is an unknown process. But I was really good with the statistics. Mm-hmm. Chris, you want to respond? After Chris, Dan will have a microphone, so if you want to get some questions, please. Sure, I just have to make sure that everyone knows I'm on board with transparency as well. I think we noted that wearing purple was a sign of transparency. So to make that clear, but also to speak very selfishly for the A to J lab, let's say that we can't really get to a place where we force or even convince companies' research outfits to reveal the research or even the underlying algorithm behind these assessments. What if we then get the proof through the best possible evaluation method? At the end of the day, what does the public really want to know again? Don't think they don't really want to know unless there's going to be individualized litigation over the outcome in Defendant A's case. What is behind the decision? I think people really want to know whether these things work. And so to take the drug analogy, which we really, really love at the lab, we are going to subject our legal innovations to the same clinical trials that the FDA requires of those drugs that all of us are taking. I can't wait 15 years from now to hear how it goes. I know, not 15, just two, just two, but we subject these ideas, whether it be a risk assessment, whether it be some new idea for enhancing civil legal justice. We provide these tools randomly in the field to decision makers to see if the outcomes are measurably better when those tools are used versus when they are not. And so we hope to be a part of this movement towards more evidence-based practices, but at the same time, making people feel more comfortable with them by showing them whether or not they stand a chance and, in fact, are working. And Chris, can I just ask you real quick? If it's not outlandish to think that, for instance, re-arrest rates could be greatly influenced by factors that they shouldn't be, by demographic innate personal characteristics, the algorithm could happily just predict that, basically is predicting what a police officer will do rather than what the accused would do and how would we then adjust for that? I don't know if that's a de-biasing question or something else, but if you're just saying the proof is in the pudding, the clinical trials are gonna come out beautifully. Perhaps, I think the one suggestion I might have, and I'm unclear right now as to how this would work in practice, although there are rumblings of this occurring in locations I'm not aware of, specifically around the country, but giving such tools, I mean, think about the evidence that is overwhelming in the empirical literature on the disparities in stops by law enforcement, by race and ethnicity. Lord knows how many officers need because they're using either improper or just wrong proxies to make these decisions. If you had something like a risk assessment at that stage, if you had these objective tools throughout the lifespan of a criminal case, that could be one solution, but I haven't given enough thought to, and I think we need to at the lab on the fact that taking the PSA, an outcome is measured in terms of new criminal activity, not even as a charge, not certainly a conviction but an arrest. That could very much be not related at all to the underlying criminal behavior but to the practices of law enforcement, and that was a choice made by the developers of the tool and thus one that we're sticking with or outcome analysis. It's just kind of things like people who protest and get arrested. Is it the likelihood that they'll protest again because then they'll get arrested again? The point is that there are tough questions notwithstanding our strong belief in the ability of the RCT methodology to suss out real causal relationships. There are still really hard questions that we have to contend with in doing this type of research. Actually, I remembered my other point. So as somebody who builds software we're living like every piece of software has bugs. And so this was the response to like should these models be allowed to be closed? And I worry a lot about bugs. So maybe someday we'll live in a world where you can have verified software and then we have to be able to write specifications correctly and I'm a little suspect about that. But in a world where every piece of software has a bug, the thought of software that has bugs, then being used in a fashion where I have no idea what A, what it thinks it's doing and then B, what it's actually doing makes me really nervous. On that note, I think we'll have plenty of hands. I'm gonna ask Dan to play the Phil Donahue role and run it down the aisles. Please. Hello, I am Kathy. I am also a computer scientist and I'm excited to see two woman black belt computer scientists on the stage. I have so many thoughts but I'll limit it to two. First on the transparency side, the United States government spends $96 billion on government contracts in case folks didn't know. It's an area that I've worked in quite a bit and the whole idea of getting a pill that you know will kill you and probably taking that pill again is what we do. We pay for contracts that are not transparent that we know will probably not work. We rehire the same contractors that have failed us and then we continue to do it. I have a long list of examples. We actually try to implement, I'm with the United States Digital Service and we've actually tried to say, we won't take a contract unless you develop in the open and companies fight so hard against that for so many reasons, citing really ridiculous reasons. So I would love to talk more about that but it's a $90 billion industry and we continually, we as a federal government in the United States continuously buy things that don't work and at one point there was an argument for these are taxpayer dollars, it should be open. American people are paying for it. They should see what's in there and we don't. And then the other bit was on the data side, I totally hear the trained people on the risk assessment scores and things like that but also as a developer and seeing what my colleagues and even myself in times, Margot you mentioned you have these if then statements, like if you're wearing purple or if you've been convicted three times all those things are biased. And even today many engineering shops don't think about that. We talk about it as part of our lives here in ethics and AI. Most groups don't talk about that. You get excited about the data and you just go build something. So how do we infuse that back into companies and groups and even students maybe working with the data. We can debate here all day about what to do once things are shipped but there are so many groups building today that are just not going to think about the biases in the code we're writing. I am teaching it in a new course that I'm developing this semester. The syllabus is secret. Actually, actually. So actually this issue you raised about software developers not understanding and thinking about the ethics and bias is huge. And I think that we as educational institutions have a role to play and I am delighted not only is Cynthia developing a course that really addresses that. We have a program throughout our entire CS curriculum that we piloted last spring and are expanding this year to in every course that we can and this year it's up to about six or eight actually inject issues of ethics embedded in the material. It's not just enough to take a separate course in ethics. You actually have to consider the ethics while you're in the throes of building, designing a user interface or while you're in the throes of cleaning the data. And so there really has to be a push starting with the education level and then persisting throughout the companies of considering in the same way that we consider maintainability and scalability and performance and ethical considerations. It's got to be an equal pillar and it starts with us. This is what I wanted to know after. We have teaching fellow position. I'll also add on that point that part of the ethics and governance of AI initiative at Berkman we've been really focused on the procurement the point of procurement which is actually more at the state level here than at the federal level for the kinds of tools we're talking about here but that point of procurement empowering state procurement officials to request the information they need to make these decisions to assess the whether ethical considerations and others have been incorporated into the design of these systems and that I think is a big part of what we're gonna be trying to do. Mr. Holland. Hi, my name is Adam Holland. I'm staff at the Berkman Center on the lawyer side of things. There's a thread but I'm hearing underlying everything everyone is saying and thank you all for this wonderful discussion. And I'm gonna try to articulate it so that you can speak to it. You mentioned just or the outcomes just we've heard about can you explain the result? You've heard about augmenting human decision making capacity, et cetera. To my ear, all that presumes that we have an idea of some better decision making process towards which we're moving teleologically. And so what I wanna know is how do we know what that is? How will we know if we've achieved it? How will we know if maybe we've somehow moved past it past the optimum to a different flavor of good but not perfect? And to the question of the proprietary stuff are those gonna be updated because I just know way too many examples of all the government bought some software and that was about 15 years ago and it hasn't been updated because we can't afford it. But if the model is predicated on learning how will it know if it's getting it right? Will it incorporate last year's data? Oh, we had this many false positive, this many false negatives, let's tweak the model or is it sorry, proprietary, you can pay for the upgrade if you want. But more broadly, I'm totally biased, I'm human. I would love to augment that. How do I know what I'm augmenting toward? Everybody has this idea that AI is emotionless without bias, but what is an optimal set of decision making criteria? Just a small question. Yeah. If you could just, yeah. Not it. So of course that's a huge question. It's also in my view exactly the right question. And as I'm trying to work out this general theory, one thing that seems clear is somehow if we knew for a given classification task who ought to be treated similarly to whom, then a lot of things would become easier. It's not automatic, but a lot of things become a lot easier. So my belief is that the real use of AI and machine learning is going to be in discovering this metric for a given classification how similar are you to me? And for every pair of people, how similar are they to each other? And I believe that this is, that we're going to, it's going to be an imperfect but improving situation. And that at the best that we're, I mean, we're gonna be doing society's best guess, but as we do more and more research, our notion of what the best guess is will improve. That's the only path that I see toward answering this. So I really think that the real question for machine learning is, how do we determine who is similar to whom for this particular classification task? In that sense, colloquially speaking, it's, I've always thought Cynthia of your work is less being able to tell us what's fair and more being able to tell us what's unfair and prevent it. You'd think they would just be the inverse of the other, but maybe not at least colloquially speaking. And it may be that what you're articulating to a lawyer's ears would sound like equal protection. That you want like to be treated like. And that's an equal protection concept that's pretty fundamental in the American and around the world frameworks of the rule of law. But equal protection still doesn't tell you the content of say, when you get past the simpler questions of bail, where it really is, are they going to hurt anyone and will they come back to face the music? Like those are empirical predictions you could make. If you're trying to figure out whether it's time to release somebody early from serving a sentence or how long to sentence a person, a big part of the formula is just an innate question of what punishment fits what crime. And you could still ask equal protection questions around it, but you need an initial sense of what counts as a grave offense. I mean, the Harvard professors that stood across Mass Ave and a carefully choreographed sort of thing with the police who were alerted that they were going to do it, the police stopped the traffic so they could walk out into the street, then they're like, please, are we gonna arrest you? You know, that's not likely to result in a book being thrown at them. They've got plenty already there, professors, for what they're doing. And that might seem fair, given that it was performative. But that all requires a theory of justice that it would be a lot to ask machine learning pattern recognition to tell you, you've gotta fill it in. And sometimes legislatures do, sometimes judges do with their various feelings. I don't know if that would change. The one other thing you do point out though is that to the extent that we started to ease the work of a typical judge by only presenting to the judge the really borderline iffy cases and trying to dispose more efficiently, maybe with advice, maybe just summarily the cases that just fit the template so easily we don't wanna waste anybody's time with them, you do risk ossifying the training set on which the judgments from real judges were made in whatever year you changed the system grateful for the efficiencies introduced by the machine learning algorithms. And it would just be weird to imagine like we're gonna get equal protection for justice as made it out in 1957. Like that, it's like, yep, it's totally consistent with what these judges from 1957 would have done. I don't know if it means there should just be some exercise where every so often we just go on a judging spree to see if things have changed since the last time judges were judging. But then again, the judges might be rusty. Do you keep a couple judges running in old fashioned artisanal courtrooms that's like here the justice is handmade. And there are other courts that's just like, spit it out the way that most of our food is produced. Maybe that would work, but again, it would be a somewhat self-conscious exercise by the judges. So stop to think about it. Chris. I have a similar view to what the others have said slightly different as well. So when I hear your question, I immediately think of an economic optimization problem. Those who are schooled in the economic analysis of the law know that with tort law, for example, we can set a standard of care in a model such that anything below that will probably lead to too many accidents and anything above that is constraining our liberty too much relative to the cost of those accidents. Are we trying to do the same thing here, especially in the criminal justice realm perhaps? But I really don't believe that any of these risk assessment tools can solve that optimization problem. The only actors, the only way that we can get to that optimization is through a conversation in the populace. That's a reason why, with the PSA for example, the matrix of options that are then mapped onto the risk scores is chosen by the jurisdiction itself. A jurisdiction has to decide what its tolerance for risk is, what its tolerance for the increasing cost of incarceration might be. That's what's gonna get updated time and time again after the jurisdiction sees its decision makers using the risk assessment tool, but the tool itself, I don't believe we'll ever be able to get you to that X star, that optimized outcome. Rather it's the interaction of the information provision from the tool with the decisions that are made, but that's what's being updated over time is our ability to accept certain restrictions on liberty but also, on the other side, certain bad outcomes. We're trying to balance those two competing interests. Do you have a sense of how different those matrices look in different jurisdictions? They can't be too different because the creators of the tool want certain general outcomes to be placed. For example, in the upper left, you're really looking for OR release, over cognizance, no conditions of any sort. In the bottom right, if you're in a right to bail state, we want those people detained, but we have to set bail really high. In the middle, that's where a lot of the decisions are made about what's offered based largely on the resources that the jurisdiction has. But in general, it's from upper left to bottom right, just more strenuous conditions, but reducing as well the amount of monetary bail that we rely upon to secure release. So you didn't really answer my question. My question, the answer is there is some baseline uniformity in that they want certain types of release to appear somewhere on the matrix, but the jurisdiction gets to choose where they're located. I'm conscious about time, but we're gonna try and squeeze in one more from Cecil, yep. Hi, I'm Cecil. I'm an LLM student here at Harvard, and mine is actually a follow up of that question. So we established at the beginning that human beings are not just, not fair. They have their own biases. So in my mind, immediately I put up a normal curve where the decisions are listed from unfair to very fair and the bulk of it are here because human beings are biased, bulk of them are with bias. So with AI, we're trying to have a continuous one. So we're trying to get rid of the outliers. I understand that, but getting rid of the outliers means that we're gonna have more decisions with bias because we know that human beings produce these decisions with their bias. So I'm curious, isn't it actually dangerous to use data coming from biased human beings because the AI will get rid of the outliers and produce more bias into the system? Another small question. I think this is the point that was made before with the garbage in, garbage out. Yes, it's, I'm not quite sure about connecting it to this outlier question, but if you're training based on biased data and you're not doing anything to compensate for this, then if you have a good learner, it will imbibe the bias with the mother's milk and the data and the resulting algorithm will be biased. One follow-up, isn't every data will be biased because human beings are, what I'm trying to say is wouldn't make it sense more if they give data artificially produced. But where do you get, how do you produce that artificial data? So I think we all agree there is some danger inherent in any data set that was produced by human behavior. I'm willing to buy into that. I don't know what the alternative is because how do I generate realistic but unbiased data? I think we're gonna have to leave it there. Join me in thanking Chris, Margo, Cynthia, and Jonathan, and thank you all for being here. Thank you, Chris. Thank you. Chris.