 Hello everyone, welcome to our first day seminar of 2022. We have Marcus Huddy here from DeepMind, but before I pass over to Marcus, I'll do a quick acknowledgement of country. So I'd like to acknowledge the traditional custodians of this land, the Nanowall and Nambri people on whose land we host this seminar, with our respects to elders past, present and imagined. So take away Marcus. Hello everyone, thanks for showing up, so plentiful. Yeah, so this is work. So normally I work on AGI, but I got involved in this HMI project very early on. Seth really wanted me to be in this project, and I thought maybe I should do something in this area. And I was probably the first one in HMI to do some research. But beyond this work, I didn't work in this area anymore. But there's plenty of more to do with respect to this idea, and you will see it's just an idea. And I thought this idea can't be new, and I still sort of think maybe 50 years ago, somebody presented I had this idea, because it's really simple. Also on the other hand, I talked about this to various people, and somehow even difficulties understand it, although I think it's really, really super simple. I mean, making it work in practice, there's more things to do. Okay, let me start. So this is about fairness. And without regret means various ideas out there to achieve fair solutions, but often they come at a cost. I mean, Bob Williamson has a paper about the cost of fairness. I could have called it cost, but sort of regret is the difference between an optimal solution and what we can achieve. And so this difference in this solution is actually really zero, which I think is quite nice. So first give motivation a main idea. It's a bit abstract, but don't worry, I go through a running example throughout the talk. Then I give the classical approach to fairness by constraining the solution space. Then the main idea of fairness without regret. And then some details, which I don't have time to talk here about, which are also important. Okay, so first on terminology, the goal is to optimize some primary objective while also caring about a secondary objective. And when I talk about optimal or best or solution quality or regret or something is relevant or comparable or so, I always refer to the primary objective. And when I talk about fairness or just or equitable or so that refers to this secondary criterion we also care about. And this is quite abstract. For this talk, it doesn't really matter when we talk about just or equitable or fair. It is just not some mathematical function. So I usually use the word fair without implying that they are all the same. So these details don't matter for this talk. Okay, so here is the one classical solution of achieving fair solution. So let S be a solution space. Say potential students, you can admit to a university. And then you have some objective, which you want to maximize. Maybe you want to hire admit the students, which have best high school grades or high IQ or whatever. There's a scholar admission criterion and you just select maybe a couple of students, which maximize this criterion, whatever it is. Okay, sometimes these solutions appear to be unfair. And we want to do something about it. And a popular bit crude and brute force approach is to just constrain the solution space. Rather than considering all potential solutions, we just limit the solution space to solutions which we deem to be fair. Okay, the problem is that in general that reduces the solution quality. So fair optimal solutions are inferior to if you just ignore the fairness. So that's also sometimes called the accuracy fairness trade off points of paper by Bob Williamson in 2018. Okay, there's another problem that often we should not be or cannot be very certain about objective itself. So, you know, think about student admission, and we have all kinds of criteria. And how do we aggregate this into one number, right? I mean, nobody would argue there's a unique way, you know, what is more important or is a great more important or your extracurricular activity or IQ or whatever. And then we have some feeling that all these things are somehow relevant. But what's the relative weighting, for instance, or even are they additive or multiplicative and so on. So, or if you think even grander, you know, about life goals, I mean, you know, the life goals may be food, shelter, family, education, entertainment, health, wealth or so whatever. And to putting this into a similar single number, we can do that. But I don't think we will ever agree on, you know, a unique way of doing that. Okay. But I will turn this problem actually into a feature. Okay. So, I use the problem that the primary objective is often itself uncertain in order to get fairer solutions. Okay. So, instead of looking at a single objective, you can see that here, let's be more honest and look at, you know, a set of objectives. Let's assume you have a committee and, you know, we can't agree what is the right criterion. And maybe we just take all the suggestions from the committee or the convex hull or whatever. So then we have a set of objectives. And well, for each objective, you tether, you know, this parameter space can be discrete continuous or whatever I give examples later. We consider or look at the optimal solution. So for each utility, you tether, we get an optimal solution as star tether. And by design, this optimal solutions, they are incomparable, right? I mean, the optimal solution with criterion A and B, they're optimal in their own right. And you cannot really say, you know, which one is better, unless you would have a meta criterion or something like this. Okay. And for now, we have this set of optimal solutions. And now we consider a secondary criterion. They have fairness criterion. We say a solution has a certain fairness, you know, real number, and we want to maximize fairness. And we can easily do that by just picking out of this set of optimal solutions, the one which is deemed to be most fair. And because all these optimal solutions are incomparable, we're not really compromising on the first optimality criterion. And, yeah, so that's here, then we take sort of all the optimal solutions with respect to the different tethers and take the arc marks over tether. And then we have the optimal fair solution. And in a sense, that's all that is. And now I'm just going through the more details. An example and how that might work. So, but before I do that, let me say what I'm not talking about. So first, you know, there's no probabilities, no machine learning, no fancy optimization algorithm, although we need a fancy optimization, which is a bi level optimization problem, which is very tricky. And so for a large scale problems, we need to solve this problem. I mean, there are packages, but probably a custom made solutions would be useful. So I also will not discuss, you know, how we come up with fairness notions and object objectives. You just have a primary objective, you have a fairness notion. And in this way, we can get fairer solution. I also don't consider, you know, bias in the data, which is a big issue. So you have to debias the data somehow, which is difficult, but maybe not impossible. I'm also not presenting a ready algorithm, unless you just take an off the shelf bi level optimization algorithm, which may be good enough. So there's plenty of more left for you if you want to work on that. Okay, so I think I can just skip through this. And there's some related work on bias and fairness, fairness enhancing interventions. And you know, this notion of fairness is, as you probably all know, is very contradictory or contentious and controversial. And Brian, what is his head in or so he gave the first HMI talk here. He had a very interesting take on this. How these different notions, you cannot satisfy. So essentially, you can only have calibration. If you want calibration, then all the other traditional fairness notions go out of the window. Yeah, the bias in the data I will not address. And while here's some work, and while here's some work, which achieves or tries to achieve fairness by constraints, which I, you know, argues inferior to this, to this new idea. Okay, so how does fairness by constraint work? So you have an overall score you. So I just, just give you an example, rather than sort of the general theory, there's anyway, there's not much to the general theory, either. So assume you just consider IQ and grade of the students. And IQ is between 50 and 200, or I mean, realistically, maybe a more narrow range and grade between zero and 10. And both you think are relevant, maybe equally, but we cannot just add them because, you know, the scale is totally the scale is totally different. It's from 50 to 200. It's from zero to 10. So maybe we rescaled the IQ by 10, then it's five to 20 and zero to 10. That seems to be now a comparable scale. And then you can maybe add them up. Or let's take a weighted average of IQ divided by 10 and grade and choose 10 to be one half if you deem there equally important. So here's some examples. So I just, you know, invented six students with IQ and grade. And here's just the initial of the students. And here you see them on this 2D plot, you know, horizontally is the IQ, vertically is the is the grade. And now the fairness criterion will be on the next slide will be, you know, with respect to gender. And you see here's a blue, so male and the red or female. Okay. So, so solution basis S, we maximize some mobility function, which you'll find on the previous slide, we say 10 to one half, which seems to be reasonable. We get the optimal solution. And the example, we have six students. Let's say we can only admit two students. I mean, of course, it's a very small example. And so the total score is of course just the score of the two students, which we want to admit. So we take the arc marks over all subsets of students in this population, restricted to that, we can only admit two students. And this gives us the optimal solution. And in this case, it's a simple threshold strategy. We look for the best student with respect to you one half, and then the second best student. And this will give us Bob and Zach. So if you quickly look at this diagram, since these criteria are equally weighted, so the diagonal lines here are the iso lines. So when you look at Bob and Zach, they lie on the most left top right part. So these are the best students with respect to this criteria. Okay. So here, because we admit two students, so it would be actually more appropriate to plot pairs of students. So we take all pairs of students, they're 15 pairs of students, some of the IQ and some they're great. And then we have this plot here. These are just the initials. But then also if you look at the diagonal here, you see B set, which is Bob and Zach, they lie on the diagonal and they're optimal. It's just in a different way of presenting that or reducing it to selecting a single point in a solution space. Okay. So that was an optimal solution by definition. And so maybe some people think this is not fair because we admitted two males, despite if you look at the average IQ of the females and the average grade of the females and then the males, they're exactly the same. So the average is 120 for the IQ and eight for males. So they seem to be sort of, you know, have equal qualities. But for some reason, this system admitted two men. Okay. So you know, some approaches to have quotas, let's say we have a quota of at least 30% women. I mean, if you have just two people, I mean, you can only have 0% or 50%. Okay. So what we do now, or what the constraint approach does is you decrease the size of the solution space to fair solutions. You have this fairness function, which is either fair or unfair, and you only say solutions which are fair. Okay. And then you just maximize with respect to the fair solutions. And if you do that in the example, in the example, you get, you can just look at the diagram and shift the diagonal and only consider the fair solutions. You get Bob or Sack together with, you know, Amy E for ISA. So in this case, there are six fair, so equally good fair solutions. And what you also see, if you just, you know, add the numbers, you see that the fair solution gets your score of 21, while the optimal solutions without fairness constraint gives you 22. So there's a cost to it. So you lose, you know, either one grade or 10 IQ points by insisting on a fair solution. Okay. So, and that's the reason why I call it regret. So this is the optimal solution here. This is the optimal fair solution. And generically, you know, this is larger than zero, and that's the regret or the cost. You suffer from fairness. Okay. So how can we avoid this? Okay. It seems impossible. But as I mentioned, I mean, we just wait IQ and grade 50-50. And, you know, it is plausible, but we could, for instance, you know, if you just take a grade and differentiate between stem and hash grade, and then say, okay, we have Q, stem grade, hash grade, and maybe they're equally important. So make one third, one third, one third, and if you split it in this way, you would weigh IQ of one third and the grade two-third, which seems to be also quite plausible. It's really hard to argue that one is better than the other. So we already have a range here of the parameter teta, right, between one half and one-third, which seems really completely incomparable. So what we do, we turn this problem into a feature. So we consider now a set of optimality criteria. So for each teta, which we deem to be reasonable, we consider this utility teta. For each teta, we get an optimal solution with respect to, I should call it teta optimal solution. And as I mentioned, by assumption, they're really incomparable. And then we design a second fairness criterion. In this case, it should not be sort of a binary fair or not fair, but sort of graded somehow, more fair or less fair. And then in this set of solutions, we try to find the fairest solution. So this one on the left-hand side is sort of the among the optimal solutions. It is the most fair one. And this one here is sort of which teta in my set of reasonable utility functions led to the optimal fair solution. Okay. So in the example, so let's assume now, so I argued sort of one-third seems to be good. So maybe two-third is also fine. So let's just take the interval between one-third and two-third, which I think is totally fine. And let's say the fairness criterion is the difference between male and female admissions, whether the minus or is not going to minimize that or the minus we want to maximize. So zero would be the best. And so what you can see in the simple example for teta star to be 0.35, it's just a random number in the interval. I mean, I think between one-quarter and which is outside of the interval and I can't remember anymore, you know, there's a range where it works. You achieve the solution and will be Amy and Zach. So the fairness criterion will be zeros or maximized. So this is a fair solution. Okay. So now, if you look at the solution and we calculate the utility with respect to teta star of this fair solution, which is 20.1 and we calculate the unconstrained solution, which is 22, it also looks like that this fair solution is inferior. But that is an illusion, because we are comparing an apple with pairs. So these two quantities, so different u-tetas should never be compared. And I could easily add a constant to the u-teta. So if I add just say, I add, I mean, the optimization is here, let me see. So here you maximize over S. So if I add here any function of teta, the maximum will be exactly the same. Okay. And here, in this optimization, it doesn't really matter. So I could just add to u-teta some function of teta. And the optimization criterion and the fairness criterion would be unaffected. And then if I add this, a proper number, I could easily even achieve here on the left hand side some higher number. I could, you know, achieve whatever I want, 100 or whatever, if I'm just adding a constant. So the fair solution looks even better than the general solution. Also, strictly speaking, I mean, you really can't compare them. That's the whole point. Okay. So some comments. So this is very generic. It doesn't rely on any specific utility function or criterion F. And also, it doesn't continue. So discrete can be linear or nonlinear. I mean, this example was linear, for instance. Yeah. It doesn't solve the problem right, you know, which criterion in which attributes you use. I mean, it's the same with this or with the constraint-based solution. But in a sense, it's easier because you don't have to, you know, if you don't mind a mistake, you just put it also in the pool of utility functions. And it's not as severe if you just fix a very specific utility function, which turns out to be sort of maybe not the right one or a suboptimal one. Okay. So here, let me go through the protocol, how you should apply that. So the first is you choose your space S or sort of the attributes you want to consider in your objective. Then you look at a class of utility functions. In the simplest case, it would be a linear combination. And then you should choose a reasonable class. And I'm thinking here about a committee who sits there. And, you know, we have these ATL scores and whatever the scores at universities and then university central comes up with all kinds of formulas. And I guess that's a committee. And they probably don't agree. And what they should do is everyone puts sort of their own, hopefully, reasonable opinion in, and then maybe you just take the convex hull of all these suggestions, what is a reasonable primary objective that gives you a class TETA and then the class of utility functions. And then again, you choose some fairness criterion and what you deem to be fair. And then the rest is in some calculation. And you should do that in this order, right? You should not, you know, like I presented, oh, you know, I find the optimal solutions. Oh, okay, that's too, too, too, too male students. That's unfair. Let's just rig the system somehow right to make it fair. So it's better to follow this protocol. I mean, first you have the set of utility functions, then you have a fairness criterion, and then you crank the handle. And then, you know, ideally accept the outcome. Because once you start iterating, you start to rig the system. Okay, so, so now the details which are skipped over. This is a very nasty optimization problem. So it's a bi-level optimization problem. It's actually a special case. So it's not a generic one. So maybe they are, that would be an interesting sort of research project for, for, you know, from the computer science side, whether this special structure can be exploited to get more efficient algorithms. Because in general, it's a really hard problem. One could try, and, you know, you can read a little bit in the, in the, in the tech report which I have out there. One could try just, you know, gradient descent, you know, it's always the first choice. And it's a double optimization. But you can differentiate through the art max. And you get some sort of dual alternating, you know, I like in games, like a settle point optimization. Unfortunately, for the linear case, it's easy to see that it completely breaks down. But on the other hand, if it's linear parameterized, this problem is very close to multi-objective optimization and parity optimality. So, so if you have a multi-objective optimization problem, and you look at the parity front, and then you clip the parity front to the allowed solutions, so which are in the set tether. And then you optimize, we find the most fair solution in this parity front. Then this is sort of an algorithm for, for this bi-level optimization in this special case of a linear combination. And there are algorithms out there for, for finding the parity front. Okay. So, from the machine learning perspective, I mean, usually, you know, we're on some machine learning here, right? So there's no machine learning so far. So, rather than hand designing utility functions or people, you know, have data, right? We have past students and then their future success rate, whatever that is, you know, you know, income or a positive contribution to society. And maybe that can inform us to improve the utility function or the objective we are designing, rather than just, you know, guessing by hand. And indeed, it should, but it will never shrink it down to a point, right? So first, I mean, we only have finite data, so we can only estimate this parameter just to find that position. And even if you have infinite data, then we probably disagree on what does it mean for a student to be successful later. You know, is it, you know, annual income or lifelong income or is it positive contribution to society or whatever? So then we just have sort of this disagreement on a higher level. And then we would get a, you know, a tri-level optimization problem. So, while machine learning can help to make this space more narrow, it will still be a set. Okay. Oh, I'm pretty good in time. It really happens. So that's sort of a summary and a discussion. Perfect fairness cannot always be a, oh, no, sorry, I didn't mention that. So while this approach, which are called far work, can improve fairness and this example sort of achieve perfect fairness, it not necessarily achieves perfect fairness. So I mean, it is set as too small than even sort of the most fair solution. I mean, it's fairer usually than, you know, taking a fixed criterion, but it is not maybe sort of perfectly fair. But on the other hand, if you constrain your solution space, you know, then you get suboptimal solutions, which is not good. And also this fairness business is pretty controversial. I mean, there are so many different notions and people cannot agree on it. So maybe it's actually better not to strive for perfect fairness with respect to a questionable criterion, but just to improve the fairness with respect to criterion we can more easily agree on. Yeah. So yeah, I said, you know, if you have a set of objectives, you know, that solves a lot of conventions. But also, I mean, maybe we cannot agree on this set. But I would argue it's easier to be in a set and in the easiest case, you have five experts and you just take, you know, this convex hull and, you know, that's it. So it should be easier to agree on a set rather than on a single point. You know, some ethicists don't like machines to make ethical decisions. But if you look at the protocol, there's actually four out of five steps under human control. And on the last step when you do the optimization, the machine does it. So there's a lot of human control there. But as I recommended, do that in the order not to rig the system. And, you know, I made a lot of simplifying assumptions. I mean, especially that the data is not biased. And well, that's how science works. I mean, sort of these last four points are, you know, the critique I got during this talk previously, that this is really, you know, there's even a paper about the abstraction fallacy or five abstract fallacies in fairness research. And of course, I'm aware of that. But, you know, you start, you know, with a simplified problem setting, and then you have to, of course, embed it, you know, into a practical solution, which someone else has to do. So maybe those who don't know that's not my area of research, I usually work on, you know, theoretical general AI. And that was just, you know, a brief stint into fairness in machine learning. I'm not working on that anymore. Okay. And, you know, I don't claim that this is perfect, but what I claim that it's fairness without regret, this approach is at least superior to fairness with constraint. Because, you know, doesn't solve all problems. And what has to be done. So we can have stochastic uncertainty in the data. And we can also have stochastic uncertainty in the objective, you know, maybe even the objective can only evaluate it probabilistically. And I have treated that, that should not be too hard. If you have missing data, I don't have it on the slide, if you have missing data, then you have to do data imputation. And it looks like that you could just use the similar approach here, you just impute the data in such a way that you get the fair solution. But this was, would get you actually extremely unfair solution. So you can read in the tech report, and why the data imputation, it doesn't work in the same way as sort of objective function imputation. Okay. As I mentioned, bias data, I have not treated, which is a big issue mentioned machine learning could help. So it would be nice to combine it. And maybe the most interesting next step would be to quantify how much fairness you gain by this approach. So you look at the optimal solution and the optimal fair solution and compare their fairness level. And I think you cannot say much in general. So you have to look at a concrete instantiation of the problem, you know, on the utility function at the fairness criterion. And of course, you know, the larger the space or the size of the utility function, and the more fairer because it includes more and more optimal solutions, the more fairer, and then the optimal fair solution will be. We are in finally this very hard optimization problem, which yeah, not to be solved. Okay, here's some references. And wow, that's the end of the talk. And I think the first time in my life that I ended early. Thank you very much. I don't know if people on Zoom can hear me because we have some mic problems. So maybe you can speak into this mic and share. So I guess if people have questions on Zoom, they should write the questions in the chat. It's going to be easier. But so maybe I can start actually with a question. So this approach is presented in case you have like a single question, one short question, what's your decision? Can you apply it if it's continuous? You mean sort of sequential decision making? I mean, if we're talking about making decisions about loan applications where you get new or online instances every five minutes or the next 10 years, sort of reinforcement learning. But you mean sort of I get new instances and then I want to update my previous solution because I have more students tripling. You have more knowledge and I mean knowledge in which I mean, okay. So you've got more instances with the results and Yeah, most students, right? No, that's different. Most students would be sort of you have more students and kick some previous ones out and more knowledge would be that maybe at the beginning you don't have all their grades or something and then the grades. I haven't thought about in this context, but it seems like I mean, at least abstractly, I mean, if you have new knowledge on your students, right? You just run the system again and you get a new solution. And practically, of course, you would hope, you know, that there's an incremental way of, you know, if you have graded descent, you would start with a previous solution and then adopt the solution or something like this. Not sure whether there's anything specific in this idea and in relation to new information. I mean, maybe in the bi-level optimization literature, there's probably, you know, if you have, you know, parameterized problem and you change the parameter slightly, how then to update your solution. And then we'll ask Kathy and then Saripa. So I just wanted to say, yeah, I think this is a really a great intervention into this debate. And it's, it really brings true, I think, with the way that people rationalize sort of affirmative action in many contexts. So for instance, in university hiring, you know, we're all aware that we want to say have more women in faculty, right? And legally, you're only supposed to use fairness as a tiebreaker. But I think in practice, what people have realized is that, you know, within the past being working with a very narrow notion of merit and what makes a good faculty member. And that, you know, if we sort of allow for different criteria and allow some incomparability between, you know, being good at certain utility functions, as you say, then there's sort of a broader space of maximal people that you can apply a tiebreaker to. And so, I mean, the tiebreaker rule seems quite restrictive. But if you have a broader idea of what makes a good candidate, then you get more candidates coming in. But it sort of attracts exactly the kind of thing that you're trying to model here precisely, having a multi-objective goal. It's not clear how to weigh up the objectives. So yeah, I guess this is just a comment to say, I actually think it looks sort of fancy and technical, but this is exactly what people want to do when they're wanting to implement affirmative action measures, I think. I was curious that you talked, that you called my talk an intervention into the discussion. I wouldn't call it tiebreaker because, I mean, tiebreaking, it doesn't really break ties. It allows for non-total ordering, right, and for incomparable candidates. And tiebreaking sounds like, okay, this candidate has 0.7, this is 0.7, and then the tiebreak. And here, but maybe you're meaning that, right? So here it's really, these candidates are really incomparable in the sense I have defined. But yeah, it's very similar. Yeah. Yeah, I just want to cheer on them this way of doing things. I think it's an intervention compared to the standard way that the algorithmic fairness discussion has gone, where there's no questioning of how we should perceive the goal, the objective function, and it's all about how we measure, how we balance fairness versus best achieving the goal. And I like the fact that you're sort of questioning what the goal should be. Thanks, which is very interesting, right? I mean, since university is trying to do something which is similar, you would expect that then, you know, the fairness community tries to sort of formalize this approach, which apparently has not been done. I mean, who knows, but I couldn't find anything in the literature, and so far nobody pointed it out. We want to maximize the renewable energy sources. But because of the network's structure, some of the customers are, and the network constants are more sensitive to some of the customers. So it is not really beneficial for the network to allow those customers to install the renewable energy sources. But it is not fair. So how this applying, because the objective is very concrete, you want to insert there as much as resources in the system, how this approach is going to be in that sense. I don't know exactly the setting what can imagine that. So if, well, the power plants or whatever, you know, have a very precise criterion, what is good, or what is an environmental friendly or so on, and everyone agrees on it, that's the criterion. Then this approach doesn't help you, right? Because it says, okay, I mean, if that's the optimal liquid, then you have to use it. But usually, right? I mean, for instance, how do we measure the environmental impact, right? I mean, there's large disagreement how the cost of the future cost, and also how many generations CO2 or so will lead to, but you should sort of be honest and estimate, okay, so maybe that's the smallest estimate and the largest estimate, and then you have a range of what does it cost to have not the renewable energy or not allowing these people to put power in the grid or whatever. And you just, you know, look at the, I mean, it should not be sort of, you know, from minus infinity to infinity, but you have a range where you feel, you know, this captures the truth somehow. And then you crank the handle here. And then if there is a point in this utility space, right, so that these users can feed in without causing too much harm, then you will get a fair or more fair solution. But even if then, they cannot feed in the electricity or I didn't understand it completely, then they're out of luck. I mean, this framework either, right? Then you have to go to hard measures like fairness constraints, and then really, you reduce the overall solution quality for society. Hi, thanks. This is really awesome. I want to push on Alba's point, because I think there's a lot of interesting stuff there. In particular, I think that it comes up when you're thinking about like a decision problem where it's both a component of distribution, of like distributing goods where you think fairness would play a role, and something about burning, where you think it might, you might want to support it off. And in particular, I think that like when you're doing a one shot decision, you're absolutely right that like there's so much uncertainty in how you justify your objective, that it doesn't like completely just choose the one that also happens to be your fairness considerations. But once you're in a sort of like, yeah, I guess online learning situation, once you start doing that, you're basically you're biasing the dynamics of that learning towards a particular path in the safe space. And so here's the example. I'm going to think about like IP. That's one you talked about here. You might think that there's this one thing that you're trying, it's trying to do both those things, essentially. It's trying to measure like the particular kind of robust like aptitudes in a population. And on the other, it's like then it's been inevitably into how we distribute goods in society. And so there's so much like uncertainty in that objective, like there's it's actually great breaks down into a bunch of different components and different aptitudes, which have different weights and there are different choices of how to measure each component. And so you might say, okay, well, given that it's going to feed into this distribution problem, let's choose the one that makes it so that IQ is like the most kind of fairly distributed, like more evenly distributed across like the global population. And that would make but the problem is that that this concern this is how I think is reflected in the debate about this sort of scientific practice. Which is that when you do that every single time, but then you're kind of cutting off a section of like science and scientific exploration that would like that would look at in different ways that you could have measured it, right? And so it's, it's like you're not just making a decision for this one time distribution, you're making a decision kind of in perpetuity for science, that this is how every time we're going to do this. And somehow that that feels for a lot of people, I think that it unduly biases the decorative science. I most likely have not understood the question of the comments. So I try to, so that the two things, but maybe they're sort of somehow related. One is if you just take sort of IQ and, but you know, there are multiple facets, right? And it's just one aggregate number does not, it's not enough to really capture, say the diversity in what society needs. In this case, I would just say, I mean, you break it up. I mean, in the IQ test, for this example, that there's a visual and there's a linguistic and then there's the mathematical stuff, right? You just break it up and then you have three scores, right? And then you apply the same technique, right? Rather than just having one score, you use these three scores. And but the other question, I think that was more like that over time, somehow you say that this will bias the system or the society more and more because it's because it's sort of used over a longer time period. So I didn't understand if you could just choose the one that like, I think it roughs people the wrong way. Maybe that, yeah, it biases the learning, right? Like, if the learning actually does something to do with you use this measure and you take a measurement and then you beat it back. But can you give an example of how the learning biases the system or the society in the long run? I mean, in general, what you're saying is that the one measure so that they can kind of compare the results across each other because they decide they find and they sort of standardize what I view is, right? And if they and if it is a concern, maybe that if we use kind of fairness considerations in that where it's supposed to be a peer-based like learning-based thing, then that means that the sort of every time you go out and measure things with an eye towards this measure, opposed to a different one, you're you're learning something a little different. And that might not be a problem because you don't know exactly what you're trying to learn. But if everybody simultaneously is is is trying to optimize for one thing or another, you're you're moving what you can, but you're you're shrinking, I guess, the state case of what could be learned. Okay, maybe understood it, maybe not, but it's just answer a different question. And maybe someone else can sort of maybe clarify what your real question is. So there's one thing I write about in the report. So sometimes, you know, if you you have a big project or whatever or diversity is good because you need an engineer, you're doing a social scientist, you need a mathematician and so on. So we need this diversity for for having a great team to achieve something. Okay, and maybe that's what you're hinting at. If you have one criterion, you're going to optimize for that, then the only higher engineers maybe or something like this, which is not good. Yeah. So this is a this is different from what I talked about. So if your objective is to say, design a rocket, right, to Mars, right, and you know, there's humans in a mass, you need, you know, engineer, you need a computer scientist, you need a social scientist or whatever, right, if this is the objective to fly to Mars, then in order to achieve that, we need this diverse group of people. And so you have to put this into the objective itself, right, because if you design the objective, like, you know, maximizing the number of, you know, highest IQ engineers, right, and then something on a social scale goes wrong, right, you know, you know, then you have designed the wrong objective. And this is not addressed by this work, right. So, I mean, some people want diversity just for diversity's sake, right, or say, you know, diversity is a form of fairness, then you put it in the fairness part. But if the diversity itself is relevant for achieving the objective, you have to design your objective in such a way that it achieves this, at least within this framework. So there are many questions, so we're going to move. So Patrick fast room had a question. So he's saying that you're using a lexical ordering on the on the objectives. And so what he argues is that you should you should treat fairness as a criterion and just add it up with the utility function and then compute the particle from there. Well, we could do it, but why is it a good idea? And what is the relative weighting? Do you do sort of, you know, you choose the way in such a way that you get, you know, a solution which is as fair as you want to have. And also, you know, there could be on different scales, right. I mean, I don't say that I don't have these problems, but I think it would just be, I mean, if you would add up fairness with utility with a parameter lambda, and you vary lambda from zero to infinity, right, you would get a parade of front, right, which then sort of you trade off fairness with optimality. But how far are you allowed to go? But I mean, when you have multiple objectives, that's what you're doing anyway. Right. So you have to choose a lambda compared to objective. Yeah, but I, okay, you can argue I choose this tethers, right, but I'm not sure how would I choose. So I have sort of the discrepancy between male and female students, and then I have an ATAR score. How would I weigh them a prior without looking at the result and choosing the parameter in such a way that the outcome is what I want? I would have quite difficulties. Your answer is that this is an elegant method to not have to choose this one. Yes, yeah. Yes, yes, and I don't get zero regret for this method. Yeah, but that's important. You choose that before the fact and based on your honest belief on how important are these different criteria in order to say achieve, you know, highest income later or best grades or positive impact in society and not looking at the fairness part. I think that's, that's, that's crucial. That's a crucial difference. Okay, maybe I have no understanding. So I thought this is, I thought this is what you're doing. You've got a whole range for data, right? And then basically you're going to find the set of optimal solution, one for each data, right? Which gives you a parietal front restricted to this, to this data thing. And then what I thought you did is among those picture, among those optimal, those points on the parietal front pick up those for which the fairness is optimal. Yeah, but the point is I don't take the whole parietal front. I only take, and by the way, this parietal front picture only works if you linearly combine criteria, right? So this approach, I mean, if I multiplicatively combine them or so. Okay, let's first stick, you know, with a linear combination, right? So I only look at the part of the parietal front, where I believe, so I assume I have this IQ and grade, right? It would be quite disingenuous to say, you know, IQ is not relevant at all, or grade is not relevant at all. So I definitely exclude or it's just relevant for 5%. So I would say, you know, maybe between 20% and 80%, I would be fine. And that's the important thing that you, that you restrict it to a reasonable range. And I then looked for the optimal fair solution in this restricted range. And second, this parietal front picture and multi-objective optimization way of dealing with this only works if you have linear combined sub-goals, right? And it could be an arbitrary complex function, right? Nonlinear. And then you cannot drizzle that out. So I have sub-goals, right? And it could be an arbitrary complex function, right? Nonlinear. And then you cannot drizzle that out. So I have a, so normally parietal front is here, one criterion here, one criterion here, and you have the diagram, right? And the parietal front is optimal, or every point on the parietal one is optimal with respect to some specific linear combination of this criterion. But linear is just a special case, which is, you know, happens often. But, you know, there can be much more complicated objective. Well, it's a, it's a bi-level optimization problem, and you just crank the handle. I think you raised your hand for quite some while. It seems to me, and there's a, there's a, what I'm going to call meta fairness kind of problem, because we use fairness in context here, but it seems like if you were to run a competition and you were going to tell the competitors that you were going to grade them this way, there was a range of possible objective functions. And then we were going to find whoever we were going to score all the entrance and pick the winner according to each of those possible objectives. And then we were going to hit it with the fairness criterion to choose after all the scores are submitted, which objective function we use. That doesn't sound like a competition like I want to take part in. It seems, that seems like it's unfair. Maybe it's unfair, it seems like it's subject to manipulation, maybe, right? Where somebody who could come after the fact could, could poison the objective functions that they don't do well with unfair solutions, so that those objective functions don't be evaluated, forcing you on the back end to use the objective function that's favorable to them. So you're sort of picking how grades are assigned after the scores are submitted, so you don't know up front what you need to do to win. That doesn't seem, in some sense, fair, but in a different sense than to use the word in the talk. Is that am I missing something or? Okay, would you say that this system or would you say that constraining the solution space is more unfair or equally unfair or less unfair? You just say, okay, we admit 30% women, that's it. That doesn't feel like it's subject to sort of adversarial things. It doesn't feel like it's manipulable in the same way as that this is. Okay, so I don't think that it's in any way manipulable. So because you, I mean, the attributes are sent in, right? The scores in IQ, so nobody can mess with this. And you should actually design, maybe I should put that in the list when you should look at it at the students. So that should become just between step four and five. And then you look what you believe is important, right? And get your relative weighting, right? And again, you don't look at the students. And I see where you're coming from. And then you look at this bi-level optimization and say, okay, we have this pool of optimal solutions. And now we're peeking around just to make it fair, right? And somebody loses out because of this. But what's the alternative? So we have this set of optimal solutions. How would you pick? I mean, they're incomparable by design. You could pick randomly, right? I don't know, maybe that's fairer, right? But I mean, then we should, you know, push fairness completely out of the window. Would you say they're incomparable by design? This occurred to me a couple of times during the talks, so I'll take advantage of the fact that I had before to ask. Aren't you using the fairness criterion to compare things that you repeatedly say are incomparable by design? Isn't that what the fairness does? Well, this is the secondary criterion, right? So the view here is we have a primary objective for whatever, right? But some people think fairness is important, right? We also want to integrate it. But we don't want to really, if it's possible, to degrade the primary objective. So the best thing would be if we can achieve both, right, rather than having a trade-off. And you can do this with this method. And maybe you could argue, I mean, sort of expanding this primary objective to a set, right, is sort of a trick to sneak in fairness. But I try to argue actually to having a point utility is very hard to justify. So it's much more honest to say, well, we have this potential criteria, and we cannot really tell which is better. From the primary objective perspective, and then because we cannot really tell, you know, why not choose a fair solution? It doesn't cost anything. There's no regret. Of course, you know, the student who loses out, yeah, you know, would not like it. He would like to have to randomize or whatever. But I mean, someone will lose out. Sorry, it's one. So we're going to end this now. But Marcus is going to be around for another week or so. Yeah. Yeah. So you can impact things directly. We're going to have a five-minute break. And then we're going to have a short interview about your experience at DeepMind. Yeah. And also afterwards, I have the rest of the day free, maybe after lunch. I'm on holiday, as you can see. So just come to me, and then, you know, we can, I can schedule you for the day, or then some other day if you send me an email.