 The title of our talk is Resolving Algorithmic Fairness and what we aim to do is to provide a conceptual framework to think about algorithmic fairness and resolving it in a way that hopefully will become clear as we go along. So just to give you a bit of background here. So with the idea of predictive algorithm, so predictive algorithm are used in a variety of context and application in healthcare, in assignment or welfare benefit, in the criminal justice system, in the banking industry. And what they do is that they assign scores to individuals, the scores that between a certain range and the scores is used to predict a behavior or a condition of interest which could be sickness in the medical context, the ability to repay a loan in the banking context of the need for housing if you are planning welfare benefit or recidivism or generally criminality in the criminal justice context. The higher the score, supposedly the more likely the behavior or the condition of interest is going to be. So the basic picture here is that we're starting with a large set of data and then we have some predictive analytics or machine learning techniques mining through this data, looking for correlations between attributes, observable or knowable attributes of an individual and then finding correlation between those attributes and the outcome that we want to predict. And so this through that process we build a risk model that is this abstract representation of that correlation between attributes and the outcome we want to predict and we use the risk model to make the prediction. Now it turns out perhaps not so surprisingly that even though this predictive algorithm promise to be free from human bias, they themselves exhibit various forms of bias and there is a large literature on that and a number of people argue that this predictive algorithm exacerbates existing inequity in society and the paper or the piece, the journalistic piece that started this debate and brought it into the public light is the propublica piece, a machine bias which I'm sure everybody is familiar with in this audience. So this is a battle ground between those who are rather critical of our predictive algorithm and machine learning algorithm and those who instead have a much more positive view of them and our take on this as we become clear in the talk is try to try to understand from a conceptual and theoretical point of view why it is so hard for predictive algorithm to deliver a fair decision across different groups for example across different racial groups. But before getting to the point I think it's important to get clear with with the terminology and so I hope you'll indulge with me with a bit of formal preliminaries. So the first notion that I want to get clear with is the notion of a risk model. The risk model is this formal representation of a certain relationship between a bunch of attributes and the outcome we want to predict and in particular the way Robin and I want to understand the risk model which might be a little bit known standard is to think about a individual as a collection of attributes and so you can think about the individual as in fact an infinite collection of attributes and so we use this x arrow infinity to denote an individual. Then there is the outcome y which is the outcome we want to predict and that's a function of the attributes that characterize the individual and then there is the group g which is one of the attributes of the individual. Now when we use the risk model to make the prediction inevitably we are going to rely on a subset of this infinite number of attributes because only a certain number of attributes are observable only a certain number of attributes are accessible in various ways and so the risk model is going to yield a prediction in terms of a score or a risk scores and the prediction is going to be based on a limited set of these attributes which we call xp so x1 all the way up to xp and we call these attributes in used in the model predictors or some people call it risk factors the terminology is not really uniform here and each of these attributes is associated with a certain regression coefficients that expresses the strength of the correlation between the attributes the predictor and the outcome of interest. So here is an example in the criminal justice context this is the public safety assessment risk model which is used in a variety of jurisdictions in the United States so the outcome to be predicted here is the the new criminal activity of an individual then you have the various risk factors or predictors such as age at current arrest or pending charge at the time of offense etc and then on the right hand side of the table you have the various points which are essentially those regression coefficients so they indicate the strength of the correlation between each of those predictors or risk factors and the outcome of interest so that's the first kind of theoretical or formal preliminaries the notion of risk model the second formal preliminaries is these various formal notions of fairness of group fairness in particular as the literature on algorithmic fairness has a proliferation of measures of fairness and it's really hard to kind of you know orient yourself in this large literature so what Robin and I have done just to simplify things a little bit is just to look at group fairness criteria under two different perspective which we call classification parity and predictive parity and the basic difference here is what you are the variables you are conditioning on so in one case you're conditioning on the outcome so you're conditioning on y which is the outcome you want to predict and you're conditioning on the group and then you see what is the prediction made by your risk model in the case of predictive parity so the other perspective on group fairness you are conditioning on the prediction made by the model the score s and then you look at what the actual outcome is and an example of classification parity is equality in false positive and false negative rates and so the important thing here is to just know that we are conditioning on y equals zero or we are conditioning on y equal one and we want those false positive and false negative rates to be the same across whatever group of interest we consider for example across different racial groups there are different instantiation or examples of this classification parity idea when we talk about equal positive false positive and equal false negative rates we are thresholding our score at a certain value so we require the score to be above or below a certain threshold a but we don't have to do that and another example of classification parity that does not rely on thresholding a score is called balance but the the the common thing is that in in any case we are conditioning on the outcome whereas when it comes to predictive parity we are no longer conditioning on the outcome but we are conditioning on the score and so a typical example is equality in positive predictive value and so you see that here we are requiring that the positive predictive value be equal across the two groups conditional on the score being above a certain threshold a now the literature on algorithmic fairness probably one of the most interesting aspects of that literature are these impossibility theorems there is a variety of them but the basic idea of these theorems is that these different fairness criteria which are desirable in their own way they cannot be simultaneously satisfied they cannot be simultaneously satisfied if the prevalence rate across the different groups we are considering are different and this assumption is a fairly natural assumption because given how society is structured it is not it is not common for different racial groups for example to have similar similar rates of whatever behavior or condition we want to predict so a natural question here is well why is fairness what we call perfect fairness meaning fairness along all these different dimensions so difficult to achieve what is the deep reason behind that a variety of answers that we could give here you could you could perhaps blame the data saying that the data are bad or flawed in some way you could blame the the risk model it said say that there is some mistake you know the risk model has been constructed now our strategy will be slightly different and what we're going to do is first we're building an ideal we're going to build an ideal risk model that delivers perfect fairness which all the measures of fairness are satisfied and then we examine the assumption that are required the various assumption that are required or that make possible having perfect fairness we relax those assumptions and we witness a progressive degradation or departure away from perfect fairness and we provide a theoretical framework to think about this degradation and this is a framework that we borrow from the literature in in in statistical about model predictive accuracy and individual risk so that's the the general plan okay so the first thing here is a simulation so this is basically the setup here is that we are generating our data we are simulating our data and then we on this base we are constructing our ideal risk model so how are we generating our data so we're thinking about a large set of individuals each possessing 20 attributes and these these attributes could be could be socioeconomic status age educational levels etc and we're assuming that eight of these 20 attributes are correlated in some way with group membership so for example educational levels or socioeconomic level could be very well correlated with with race so there's some somewhat proxy of race if they're combined in a certain way and then we have other attributes that are not correlated with race at all could think about age for example maybe not correlated with race at all and then we assume that there is some deterministic function that given what the attributes of an individual are gives us the outcome of interest the the behavior or the condition that we want to predict so there is a deterministic relation between the attributes whether an individual possesses or doesn't possess certain combination of attributes and the outcome of interest so this is how we assume the world to be kind of regulated and then we are now generating this data we're generating data in greater numbers and we are dividing the data into two into training data set and a test data set how how many how big are our data sets with 10 at the power of five so we have 100 000 individuals essentially and the difference between the training data set and the test data set besides how these data sets are used is that the the group G in the in the training data set we have 40 percent of individuals are in group one and then in the test data set 60 percent of individuals are in group one so there's a slight difference between between the two just just to make things a little more realistic and finally so we have all our data we have kind of this this generation of the world that we've simulated and on that we can build our risk model so what we did we we fit a certain regression model on the training data we evaluate this this the risk model that we get on on the test data we optimize and we get this ideal risk model to make our prediction and as it turns out this ideal risk model is really good so it ends up it ends up predicting in it ends up performing very very well on both the dimensional predictive parity so it means that the there is very very little difference between how the risk model the prediction made by the risk model performed for one group versus another so there is a difference in policy predictive value by 0.013 percent so very small same thing with classification parity which is the other notion of of fairness that I introduced earlier the difference in false positive rate and false negative rates in between the two groups are very very small and if we look at at the accuracy of the risk model so independently of this comparative notion of fairness the the model performs pretty well so all the classification are correct so there are only 10 errors out of 100 000 instances and the out-of-sample square error loss which is the difference between the true the true outcome which in this case is either zero one and the predictive outcome which is a probability between zero and one even there we have we have a squared even there we have a it's the out-of-sample square error loss is is very very low so so this algorithm performs great on both fairness measures and accuracy measures so how did we get there how did we get to this to this kind of idealized situation and here in the slides you see listed the number of attribute the number sorry the number of conditions or idealized assumption that that we that we use so first all the attributes that were relevant for making the prediction were observable and used the risk model so we had full information nothing was left out second our training sample was both large and representative so there was no the training sample was not skewed in any way and was pretty large third we made a deterministic assumption so if you recall there is a deterministic function from a certain combination of attributes and the outcome of interest and robin will talk about that more in detail later and then finally no attributes no predictors in our model that we used was misleading they were all correct they were all true and reliable predictors so the model was was well specified in that sense so what happens if we drop the first assumption for example if we so here is what happens to predictive accuracy so these are non-comparative measures how good are the prediction of the of our risk model compared to the true outcomes and you see that as we as we take fewer and fewer prediction into account we move further and further away from from a zero squared error loss and when when the when the squared error loss is zero we have perfect predictive accuracy so as we have fewer and fewer predictors we we move further away from this perfect predictive accuracy now when it comes to the comparative measure of fairness we see a similar degradation but things are a little bit more complicated so let's start with the with the picture on the right hand side so we have a predictive parity sort of goes progressively down as you move away from from the the 20 attributes as you as you employ fewer and fewer attributes when it comes to classification parity on the left hand side it's a little bit different in the sense that the interesting thing is that you have perfect classification parity at the two extremes when you have a the full range of attributes a predictor that you use or when you have zero predictors and so and in between classification parity kind of varies in ways that are not fully explainable now what we could do now here is dropping the other assumption progressively so dropping the second the third and the fourth assumption and determinism and model specification and what we get we get a progressively degradation of predictive accuracy that you see here the the the last plot the plot on the top the one of these kind of with the little circles this is the one in which we have inserted predictors that are not truly indicative of the outcome so they're kind of not not not true predictors and you see that even if you use a 30 predictor you cannot really get to zero squared error loss so you can never approximate a complete predictive accuracy and a similar degradation you see when it comes to the various measures of fairness as you drop our assumption it becomes increasingly more difficult to get to full to full fairness and this one is the one in which we have introduced predictors that are that are not truly tracking the the outcome that we want to predict and so you see that even if you use a lot of predictors you cannot get to full equality in positive predictive value you cannot get also equality in in false positive rate or false negative rates etc but interestingly in this case if you use zero predictors you're still performing extremely well in terms of the various measures of classification parity so i think i'm gonna hand it over to robin now who will give you a more a more theoretical um understanding what's going on here so robin i'm gonna stop sharing the screen okay okay thank you marcello um let me pull up the slides from my side and put this to full screen i think there okay good thank you all right thank you marcello um so this is a a lot of simulations that we have done for you and put in front of you and i just wanted to you know as i take over and i wanted to invite you to kind of step back a little bit and really to explain to you why are we trying to look at things in this particular way i why are we trying to kind of peel this perfect fairness you know layer by layer to the variables where we no longer have it so this all comes down to um what we believe as a fundamentally as a model or as an institution who employ a algorithm to make predictive guesses about people's personal individual outcome there is always a implied notion of individual risk that this algorithm is after however you like you're not we are estimating something and the something is what we conceive as the notion of individual risk what is exactly this individual risk and how should we concept conceive that though um you know it marcello introduced earlier that we'd like to think about every individual uh to the full extent of their individuality as expressed as this infinite vector of attributes which is at infinity and we also like to invite you to think about it suppose that we have we have access to this almighty model that's so rich and so correct that somehow if we were to be able to access this infinite information about an individual this model s infinity would be able to in fact deliver the ideal risk about this person um in in the full extent of exvenus and this is you can't get really more idealistic than this to call that number the individual risk of a person x infinity but of course that is all very conceptual and you know uh kind of dive into this world of intangible you know attributes so let's step a little back a little bit and think about in reality we can't really you know have your hands are measured all the possible attributes of person but rather only the ones that can be observed practically and okay still think about having a great algorithm that can deliver the best risk quantified based on these practically observable information and we may well call that to be our conceptualization of individual risk that is not even quite practical because in reality in the context of criminal justice or in hospitals or in the context of you know clinical diagnosis and so on uh we have this notion that some information cannot be deemed admissible into the predictive algorithm we can only have access to a limited set of information that can be used and things that will be excluded for example is for example a very notion of group might be thought of as in admissible or sensitive attributes or things that perhaps have to do with the static features of people those things maybe we do not have access and but still we're thinking about delivering ideally as best as we can a risk quantification of based on the limited information we have XP of this person and that even though is not all the story because in reality in order to deliver and to fit the model we have to somehow use a perhaps limited set of training data so this difference between the third and the fourth here is the note the theta being indexed here for the risk model one here we use theta star to to say that perhaps this would be the best model that had we had access to everybody in this world this is how we can and this you know without the test estimation this would be the true parameters that govern the risk whereas in reality we have to estimate this with a theta hat based on limited training sense so why are we talking about all these different layers of possible notions of individual risk is that in the context of understanding predictive accuracy we do have well-established frameworks from the statistical literature that could help us understand how these accuracy or the lack of accuracy really decompose in terms of how we conceptualize the notion of individual risk this is thanks to the multi-resolution framework which was proposed by Professor Xia Li-Ming and also advanced very recently by a paper Lee and Man to appear in the Journal of American Association of Statistics so you know the references there towards the end which we can see but I wanted to explain how this framework that decomposes algorithm accuracy can really help us understand this our you know goal towards achieving fairness is that if we think about every one of these risk scores as potentially being what we conceptualize as the ideal risk score we want to call as our target we can ask the question just how far away are we in terms of using that individual risk in approximating the overall ultimate outcome of the individual and in particular you can see that all these risk scores are arranged from the lower resolution which is a practical we can deliver with the limited information at a lower resolution all the way to a highest resolution where we can have this almighty risk model that can have access to everybody's risk everybody's attributes and depending on where we want to what at one resolution we want to call this conceptualized individual risk what is going to be in between our target individual risk and the ultimate outcome that is going to be called a bias not in the sense of fairness but rather in the sense of estimation bias so this is a discrepancy between our the the target individual risk and the ultimate outcome and what is between what is practically achievable and the individual risk is the variance or practically speaking just how difficult it is for us to be able to deliver on this idealized the risk model now the examples that we've been showing you and especially the first one where we were able to achieve perfect accuracy that was an instance where we are taking away all this fuss about variance or difficulty of estimation we're using a perfect simulation large sample example to say that we are conceiving ourselves as if we are the oracle that who knows everything about the individual this is our conceptualized notion of individual risk at the highest resolution and we are able to deliver that because we know the right model we're delivering that but there's still a question the question now becomes even if we can have access and deliver s infinity is that the end of the story or to to put in a different way in a more technical way is or expected square error loss between s infinity and the actual outcome is that number larger than zero or equal to zero and this becomes a really philosophical this debate whether we think that the world ultimately is deterministic or there is a truly some kind of objective chance that dictates the outcome of individuals now to be clear the question of whether this quantity is zero or larger than zero in the context of the classical model fitting predictive algorithms it doesn't matter because all people care about is to push this bias as small as possible to get as close to zero as we can and if we can get to zero it's fine but in the case of fairness whether we are exactly a zero or not makes a world of difference the way that we instill determinism into our simulation as Marcelo mentioned was that we initially set up this simulation to say that the outcome of the individual is in fact a deterministic outcome of this probate score that is essentially a transformed version of a weighted sum of the individual's attributes we're going to swap that out into what we call the objective chance assumption which is another way you can think about the world namely there is truly this probate score this transformed version of the sum of the attributes that is a fractional number that truly dictates the propensity of which this person would come out to have a positive outcome and oracles somehow flips a coin and this randomness is not something that we can get hold of ever now if we are to use that as the new assumption for the simulation by endorsing the objective chance assumption on the left hand side you see that these are the true so-called true risks for these 100,000 people in the test data by the flip of a coin mechanism this realizes to the simulated actual observations the binary outcomes and if we were to use the a great model this is fitted on a larger representative training set to try to capture on this true risk we see that you know the estimate rates the risk captures the true risk very well but this distribution on the right is very far off from the actual outcome therefore in terms of as far as accuracy is concerned we are not there even if we are using all the useful and actually truthful predict predictors to fit the model we are not able to achieve absolute accuracy and by in so doing also failing to achieve classification parity however you like to measure it as well as predictive parity as here measured by the positive predictive value so what we have really shown you and this is really the last slide to summarize what we have said so far is that we wanted to really urge everybody to think about whenever we are using algorithmic predictions about personal outcomes this commands the modeler to conceive a notion of individual risks in one way or another and apparently perfect fairness as this requirement it's something that is possible however possible only under extremely idealized situations and these are the situations that are very abstract and highly untestable in reality we have shown you that in order to achieve that we don't only require perfect data and that we have access to a perfect model but we also have to have this very particular assumption about that individual risk is a deterministic function of its own attributes now as soon as we step just a tiny bit away from all these idealized assumptions perfect fairness become you know starts to unravel and we don't even have to go into the realm of that data unrepresentative data or you know terribly specified model we just have to step back a little bit to have still good but smaller data or to give up on our notion of determinism there we would eluse our grasp on perfect fairness and now to understand this you know that would perhaps be something that would be done you know in our future work is to call for this theoretical exposition of how fairness degrades as we gradually move away from these set of idealized assumptions and just at rates and under what situations this degradation is controllable within you know within the algorithms or the modelers you know conscious so you know as and also sort of as a as a note towards you know empirically what do we propose as a resolution and then what I want to say is that so far you know in order to address this apparent you know this empirically there's not a way to really to resolve the impossibility theorems as they realize themselves in actual model building and so far the step that's been taken from the machine learning literature is to use some kind of constraint organization to say that fairness is really imposing an external constraint on our pursuit to accuracy and let's try to do the the model selection with fairness in mind as a constraint now I think this is a very well motivated approach that is both practical and also theoretical decision theoretically supported but I think at the same time we should understand that as soon as we begin to do that there is um a perhaps subtle change in the notion of exactly what we're after exactly what we're now construing as individual risk it may not be what we thought we are after anymore so that's something to keep in mind and we are thinking and this is really for future work potentially that perhaps is there any other way that we can put them you know more positive spin instead of thinking about fairness as a constraint but rather as a catalyst in the sense that we're now seeing that fairness and accuracy can have this tiny intersection just when everything is perfect but can we also think about fairness as a criterion that is asking for a higher order sense of accuracy namely subgroup accuracy that the model we're demanding the model to perform not only well but also well for every one of the groups and this is related to the sense of subset calibration which is discussed in Phil Davis paper on individual risk in a relational sense that we would like to achieve you know or pose higher order senses of accuracy in the in the relational sense in the subgroup sense and perhaps be able to bring the pursuit of fairness and accuracy along to the same lines so that's really it and here are a select collection of references and we are very happy to take questions thank you very much for listening wonderful thanks so much for that I really blasted the cobwebs out at nine o'clock on a on the Thursday morning super stuff okay so the way we do the q&a for those of you who are attendees if you put your question into the q&a panel I'll draw it out from there and those of you who are panelists who are on video if you raise your hand I will come to you and I will I will aim to sort of rotate between different disciplinary perspectives and then as Michael says in the chat we'll continue the discussion over on the slack after our hour is up and so we'll start with a with a compute question from computer science Leshing would you like to take the first question please um thank you both for the very soft provoking talk and the clarity of analysis I enjoyed it very much um I um so uh I'll start with a specific question which was one of the slides you said well this idealistic scenario has four assumptions um you have essentially tweaked and varied three of them and I was wondering whether well the fourth assumption was model mis-specification um I wonder whether you have thoughts about how to do that one especially in relationship in relationship to say classifier transparency or not I I know that that one can probably should be analyzed in a way that's different than being down here um would you agree does that make makes sense um I'll I'll try to get us give a stab at your question thank you very much I really appreciate um the this is a great question I think this really is model mis-specification is something that is again some kind of a conceptual thing and we almost always think that we never really do have the right model in reality um the way that we are playing with that assumption in our analysis is instead of thinking that the model was mis-specified we are mis-specifying some attributes so that's the craft where Marcelo showed that we have these additional predictors that shouldn't be there that was put into the model and what you see there is that first of all it slows the rate at which you can achieve good accuracy and good fairness and also it's because you also spend these extra efforts to estimate these extraneous parameters it actually makes the model overall performance worse because you're spending you know energy on things that are certainly shouldn't be there um I think this question is very deep because when we only care about prediction accuracy um exactly what we mean by model specification is also a questionable context in estimation we have a very precise notion the model has to be correct but we can't have a bad model in that the model isn't really respecting the generating mechanism of the data but can still achieve good predictive accuracy it's possible and I don't know if you know what's available now right now with all these deep learning things are black box which is perhaps not what you want but if the model is so um intricately and massively parametrized and really able to find this very fine-grained map between the attribute space and the outcome space maybe there is a way that it can even exhaust all the possibilities and be able to deliver predictive accuracy we don't know that but it's it's some kind of a thing to think about yeah thanks for that answer um so it just um yeah as you were talking I realized that so um my group have been recently looking at a little bit at the long-standing literature of universal approximation of in particular these neural network models um so in that case you basically says if your model is a certain class of functions it's able to approximate any function that you want in um in that case maybe the problem of model specification is converted to like one of those um that you outlined before basically the estimation bias and variance and and so on right exactly but I think you know your emphasis on transparency is important too because part of fairness as a way of communicating with the users it has to transparency regardless is just adding to the equation so um unfortunately that's not something that deep learning could provide so there's something to think about or that's working progress yeah right exactly yes okay um so the next question is going to come from uh from Mario from the perspective of philosophy thank you for a wonderful talk Robin and Marcelo I really enjoyed that I find your notion of determinism very interesting so if I understood the the talk correctly you say something like if we had perfect information if we had all the information we can wish for it is still good to have this uh deterministic function as compared to like a like non-deterministic or statistical function um um to get a better prediction like a perfect prediction and and I find that uh very interesting I would like to hear more why why did determinist assumption add something on certain other distributions like non-deterministic functions and also if you zoom out a little bit um isn't it a problem to use a deterministic function where you also have like you know like super many attributes as kind of a guiding idea where we should go uh namely in the moment where you think about privacy so we also we don't want that all the information of individuals are given to some institution also like if you think about the bank loan application also like like I see a new um field of tension here and I hope you you understand what I mean thank you should I should I give it okay um yeah yeah Robin you start and then I I cannot oh okay sure I'm sure you have more to say about uh that I you know I was using the word determinism in a very naive way that uh Marcelo would pick on me to say that you know this thing has has different meanings um we both situations are hypothetical and we just the distinction in our simulation so it doesn't exceed the fact that you know we we are is there this ultimate layer of indeterminacy that we are assuming that you can or cannot have have access to and that's the only thing that's making the so-called distinction between determinism and objective chance um in reality again um even if you think that the word is deterministic you can never have access to as many predictors as you possibly can especially for you know thinking about these realistic restrictions of policy and so on so if you like to take this pessimistic view that the world is a de facto object you know objective chance everything is ultimately random because there are just things that they're unknowable whether out of legal reasons out of cosmological reasons um but then if you think that the perhaps the message we have is negative that that you know in general situations are perfect fairness with all things simultaneously satisfied may not be achievable thank you yeah so if I can add one one thing it's a it's a great question and a very difficult question but a discussion that Robbie and I had multiple times is what is this notion of risk that we are trying to capture so you could you could think you could think ultimately that there is just there is an outcome that you want to predict and that outcomes is essentially zero or one either these events happens or doesn't happen and presumably there has to be some deterministic path that leads to that outcome so that seems a plausible view of the world on the other hand it could it could very well be that that outcome is just generated by the flip of a coin without any clear deterministic path and I think the really difficult question is in which conditions are we and this we cannot really test we cannot really tell whether and so if these has such important impact in the ability of the algorithm to make prediction and also in the ability of the algorithm to deliver fairness that's I think something that we we need to think about seriously because it is ultimately a non-testable assumption and so we cannot know whether the algorithm is performing badly in terms of accuracy of fairness because of reasons that are inherent to the algorithm itself or or because of this you know fundamental difference in how the world works thanks okay so I'm going to keep the next couple questions we've got a few questions that are sort of more on the kind of application side and we've got a few that are more on the technical side I'm going to keep going with some of the technical stuff and then come to the more applied ones after just to sort of keep the flow going and the next question comes from I'm so glad we've had computer science for philosophy and now we have any psychologist Michael Smithson who has asked so this is a statistical one you guys will understand this better than me I'm just going to read it out often predictor coefficients differ across relevant groups what happens to fairness when group management membership is entered as a moderator into the model hi Michael I get to see you here um excellent question the way that we built our simulation is that group enter as in that the conditional distribution with some of the attributes differ depending on the group membership we did not enter group as a moderator or anything perhaps as an interaction term in the model because due to the consideration that oftentimes in reality people would think that group membership cannot be used as a predictor in building the model um but it's a good question that if this was used as a part of the construction process but nevertheless the model is mis-specified that we are not fitting the model with these what would happen now we have never tried but this is a great great point okay so um yeah as well as as well as the academic approaches there's also we also have function industry here and there's a comment on the technical side uh from Chris Doman um with his insurance actuary hat on um so he writes the general approach in the insurance industry context is to reject determinism out of hand contracts go way too long into the future to even entertain the idea and further argue predictive accuracy of the underlying riskiness i.e. ex ante expected loss as fair um rather than measuring outcomes ex post i.e. whether certain people claimed a lot and this is based on an economic argument more than anything else risk transfer should be transferred at expected risk cost irrespective of outcomes after fact what are your thoughts on this notion of fairness um if you'd like Chris to expand on that i can i can allow him to talk if you would like some more uh on that i i think i would like to hear more unless Robin has grasped the question fully no no i i was going to say if uh if you could please expand on the notion because i think it's not clear in my head how would you what would be that exactly the question is what would you conceptualize as the risk as opposed to using the outcome what is that thing that you're then judging the model against okay chris hopefully you you're this is the first time we've tried this so hopefully you're able to talk now let's let's see if you can't hear me you won't be able to tell me you can't hear me but maybe you can not if you can okay so i i i guess um in the industry this is this is a topic i've written about before i'm happy to send you one of my papers after the fact which you might find interesting but i guess um in the industry context the the idea that you'd measure sort of who's claimed and who's not after the fact and sort of measure either claims or not claims sort of is in a group fairness type setup just seems a little counterintuitive because whether you claim or not is sort of a random draw more than anything else a draw that's biased depending on your underlying riskiness but there's certainly a level of chance there that's important to recognize so you sort of just reject determinism because there's always going to be some chance that you know maybe you don't have the data for it or maybe it is genuine chance you know we can argue about that um but there's likely to be some chance there and so if you're buying an insurance contract to protect yourself from future risk um there's a a a long argument that's run that basically says economically what you should do is price that contract at the sort of expected claims costs before the fact and then some people are going to claim some people are going to not but that's sort of irrelevant what you need to do before the fact is charge people an appropriate price based on your understanding of their actual risk beforehand so in the insurance context some of these group fairness notions particularly the ones dependent on outcomes sort of don't quite make sense but I wondered if you had any thoughts on the topic or if you sort of thought about it at all because it's very sort of related to what you've been talking about yeah so I uh I think I think we would like to see your paper and then and think about it more but uh it does seem that in the in the insurance context you um while for example in the criminal justice context you might be very interested in in learning as much as you can about an individual so that you really give an individualized risk about that individual in the insurance company it seems that you are okay in treating a bunch of individuals as exchangeable with one another so long as at the end of the day you you manage to make a profit in the long run so that might be a difference between the criminal justice context and the insurance context and and and the notion of individual risk and the degree of of fine greatness or amount of information that is that is required might be different in in the two contexts and so that might also be why certain notion of group fairness uh you're saying don't don't make sense in in the in the insurance company and the insurance context another issue seems to be that while it might be optimal to get as close as you can to individual risk and therefore collect as much information as possible as robin was explaining in the multi-resolution framework that gives rise to to a higher variance and so the the risk that you determine might not be you know might not be uh uh actually tracking uh um the law the long run frequency that you're going to have and so in the insurance companies that you might have maybe more losses that you would expect uh but anyway I think we we need to think about more and and we want to see your paper of course thanks so I'll definitely have a chat with you after the fact I think but yeah I'll send you some stuff and maybe we can have a chat yeah yeah definitely thank you good stuff okay so now on to law um so Luria Bennett Moses um I'm here at UNSW um writes in the Q&A a number of standards are being drafted to deal with issues like bias and fairness in the context of artificial intelligence and or autonomous and intelligent systems um what are the implications of this approach for how such standards should be approached so I'll I'll I'll try to give a a stab at this so I think it um it depends how I would I would like to see how these standards are written in the law but I think one one of the message of the talk is that there are these multiple factors that affect the performance on fairness of of your classification algorithm and these factors could be um how many attributes you are considering what about your your your your the specification of your model what about the the deterministic assumption etc so I think now these attributes these these assumptions that affect the the fairness performance of the algorithm should be taken into account in whatever standards are being drafted when you assess the the fairness of an of an algorithm now the catch here however is that as as Robin and I have been trying to say is that some of these assumptions are untestable and so I I I really don't know what to say in in that sense because when you're trying to draft these standards for fairness to to assess the performance of algorithm and you're trying to build them in the law I guess the assumption is that there is some empirically verifiable criteria that you can latch on to and and and our our point is that we're not so sure that everything is empirically verifiable and so there is there is a degree of uncertainty or kind of deep uncertainty there so maybe it's bad news for for those who are trying to draft standards in the law but we'll have to I like to to hear more about about those those standards okay so someone the person who asked how we'd explain Adler and McFan is to a business executive he doesn't understand the field for that person I would recommend if you look at our channel there's a video by the talk by Clinton Castro which I think was a sort of a really useful introduction to to the field that I would target it's a sort of a more general audience with with this one from Marcelo and Robin yesterday I told them that we've all done a lot of the intro work so they should feel free to to hit us with the hard stuff so I want to I want to stick to that but I do have a final question just from me so this is coming more from the perspective of political philosophy now this is partly just a matter of framing like I think all the things that you're showing here are really interesting and important and I think that one of the things that would be really cool to do would be if you're able to make some of this some of these things available so people can play with them I think it's it could be a really useful resource but I suppose there's a framing issue which is that when you present things in this light it's sort of like you're saying that the the unfairness at stake is it's kind of like a cosmic unfairness like even if we do our absolute best like the way the world is like our inability to know whether there's determinism or objective chance our inability to capture all the information it just makes unfairness this sort of unavoidable almost fact of nature and that and it's sort of it obscures the ways in which the unfairness that is caused by and is revealed by algorithmic prediction systems is very much something that's a product of the societies that we live in the the social structures that have engendered you know long histories of of unequal power relations and you know that there is a bit of a danger with some of the more technical work on algorithmic fairness that it can lead to a kind of throwing up our hands and saying well you know what can we do determinism or whatever you know like whereas really the the issues that are causing the actual negative implications and impacts of these systems you know just taking compass as one example I said this before on this on this seminar series you know take take any any algorithmic prediction system take one designed by by god and applied in the context of mass incarceration in the US and the sort of structural injustice that you have and structural racism that you have there and it's it's not going to be any good like it's just going to sort of compound the existing justice so I guess the the comment is I think it's really valuable when doing this kind of technical work to sort of say at some point at the beginning around look obviously the real problem here is structural racism in the US and you know the sort of whatever we do at the margins within algorithmic fairness is is only going to be sort of window dressing on that but then I do think that there is a sort of there's a substantive philosophical point there as well which is like yeah do we how do we sort of put together the kinds of how do we situate the kinds of formal constraints on achieving fairness that you've described alongside the if you like social or structural constraints on achieving fairness that a political philosopher or sociologist or someone like that might might bring to the fore and I think that could be an interesting part of the conceptual framework that you that you provide if you're able to think in those terms thank you very much Seth I think yes I absolutely agree with what you said and I think it's such a wonderful point that you're making and this is perhaps something that you know thinking with in light of what you said and thinking about what you said in the first 30 minutes this is not something that really came out very well because I think the work that Marcel and I did really was to we kind of really was acting as and thinking from the inside perspective of if we were the person that are tasked to build a model that now would have to start to put out these predictions what do I need to know even for myself as a guiding principle the theoretical breakdowns of what is properly achievable I think this is exactly why in the end I think we're pushing for we understand where well how to decompose the lack of accuracy in the predictive model we do not understand how fairness or the lack of fairness decomposes and this is the exactly the kind of theoretical guiding principles that we need to be there so that we know what is sorry where were you right exactly so so as I was saying we do not have a theoretical guiding principle yet I think that's where the composition into you know we haven't answered a question at all if what if what if we do have garbage data what would that do to the model and exactly what is the distance and how far away we are and how do we in fact peel that away from the model performance and that is the question that we answer but you know all taken together it's you know we I think you know to only look at things in percol is one thing that we're doing practically but also further you know now we kind of need to have this higher order sense of what we should expect of things yeah I think yeah the possibility that the the the risks or risk scores that themselves will contribute to the background injustice you know that you'll you'll end up for example sending more police to an area that has already been over police and will lead thereby to more arrests or whatever you use as a part of that too now we have we've we've hit our time so we'll have to continue over on the slack and so for for everyone here thanks so much Robin and Marcelo for doing this it was really great just the kind of research we want to see as part of this and let's give the customer a round of applause and if you're able to head over to the slack to answer any further questions people in the audience please head over there too Michael's put up a link to join and then we'll call it a day thank you