 Welcome to this fifth podcast of Robustly Beneficial. Today, we're going to discuss the first chapter of PhD thesis from EPFL, which is called Preference Learning from Comparisons by Luca Mestre, who graduated two years ago, 2018. And I usually very enjoy reading the introduction of PhD thesis, because people usually make an extra effort to have this general overview on the problem, which is very, very nice. And this one is, I think, of particular interest to AI safety. Yeah, indeed, because for the field of AI safety, learning preferences of human is a central point of solving the problem of AI risk. It's called an alignment problem, basically. Having algorithms, because we can't directly specify human preferences, because it's too complex. Having an algorithm that query humans for information and learns what are our preferences. This actually is an old problem, a very old problem, I guess, but it's also like the first models were quite old, like 100 years old. And the first basic model was by Thurston. I don't remember his first name, but Thurston, who had this very simple model. I really like the model. It's based on the fact that if you just observe people, you just query people, and whenever people tell you something, it was always consistent and meaningful, then I guess it would be easier to do this preference learning. But one of the great difficulties about humans is that we often change our minds. If it's a weakness or something good, but you have to take into account that, depending on context, on numerous extraneous factors, we often change our minds. So we need to model also this uncertainty. And the basic model by Thurston is that we have these preferences, these cardinal preferences, that we have a score for different options. And whenever we face with a choice, then there's some added uncertainty, but maybe for this time, there's some noise that's added. So that's the noise model. And because of this noise, there's some uncertainty about what the user is going to say he prefers. And basically, you can use this to then learn the preferences, even though the revealed preferences are inconsistent. I think this was a nice model. And basically, the intuition is that the further away two options are, so if you have a two very distinct, like clearly a clear preference between two options, for instance, killing all of mankind at once and saving all of mankind at once, then you should not hesitate. And it's smaller than essentially no uncertainty. But if it's very, very similar, then maybe you're going to flip often between the two options. Yes. To explain this better, maybe this model is also used for ranking chess players. And you can understand that if two chess players are very different in terms of level, then we would be sure with nearly close to 100% priority that the better one is going to win. But if they have very similar levels, then we are very unsure. And another possibility for this noise to come is not necessarily because we are inconsistent and change our mind, but it could also be that we want to measure the preferences of the population and people disagree within this population. So if we pick at random in this population and ask preferences between A and B, it might be that for proportion of the population, they prefer B. Another, they prefer A and N. What this model would be doing is actually evaluating this preference and generalizing to create a ranking of a larger number of possibilities. So what's nice in the introduction of the pageant thesis is that there's an application of this to determine how awful different crimes are. But it's very old data, so I think the things that are very weird to us, like for instance, adultery or seduction or abortion also are extremely bad for these people, maybe we would disagree now. Maybe we don't have the same preferences as the one we used to have. Yeah, that's true. One question that's quite interesting is why learning preferences based on comparisons and not on something else, for example, we could learn your preference by asking you to score every alternatives. So the answer to that is that it's very easy for us to, when we look at two things, say between A and B, which one we prefer. But the other task of assigning a score to one possibility without looking at all the other alternatives is a much more difficult task. So for example, there was also this work where they would learn an objective function for neural network based on comparison included by humans. And the example they use is for example, getting a small virtual animal to do a backflip where it's a very difficult task to write an objective function. What does it mean to do a right backflip? But simply by showing to humans approximately maybe a thousand of examples of different moves, they would learn from this comparison. It's easy to tell which one is the best backflip between two examples. Yeah, and I guess there's another historical example which is the one that Mark Zuckerberg used. I think it was called Face Match. That was not very ethical, but he basically proposed to students at Harvard to rank how pretty different girls were on campus. And the way he did this was using this comparison asking for people, I guess, male students' inputs on this question. Yeah, so this turns out to be quite effective. I'm not sure it's the ultimate, like maybe it's still better to combine it with other methods, but when you have very preferences that are very complex to describe, like for instance, why is it that the crime is really awful? That's not easy when you're trying to explain this. Well, it turns out that using comparisons is a much more practical way to go. Now, a lot of the whole physics is actually based on an assumption that we may question, which is disputed, which is independence from irrelevant alternatives. So the assumption is that if I give you choices A, B, and C, A and B first, then maybe you prefer A to B if I just give you, if you have to choose between A and B. But if I now add option C, so you now have to choose between A, B, and C, then it seems irrational to certainly change your mind and now prefer option B over A and C. And that's at least the assumption that's made throughout the thesis. So it's a bit more precise than this. It's more like the fraction of the time you're going to choose A over B. In the first case, it's the same as the fraction of time you're going to choose A over B in the second case. So that's independence from irrelevant alternative. And it's really a central assumption in all of this work. But so actually, it turns out that empirical psychology shows that this does not hold, in practice. Yeah, it's easy to propose alternatives to people and showing that this doesn't hold anymore. Yeah. So yeah, so there's examples from Dan Ariely. He talked about this at some point in his book, at least, where you have this kind of proposals and people switched, like they use, if there's only A and B, they say they prefer A, but if there's A, B and C, they go for B. And so this was an important question, like is the model flawed or should we still use the model? Is the model trying to capture something that's correct for perhaps all irrationality? So yeah, we did a poll yesterday in our reading group and essentially everybody agrees to say that it was irrational to have this switch of preferences and it's better for the model to learn something consistent. So I don't know if maybe some people disagree with this, but at some point there's a difference between just describing people's preferences and something that you actually want in the objective function of an algorithm, I guess. Sorry, what do you mean by this? So because one of the basic idea you would say, if you want to solve alignment, then you should make sure that the objective function of the algorithm is the one that a human would have. But arguably, like the human would flip, depending on what sets of choices you propose. And I guess we all agree yesterday that this was not what we should do, that we should not follow human's preferences. So somehow the way I see it is that we should not follow human's behavior because the behavior is irrational and still the preferences they, I don't see, like we discussed yesterday, I don't see how preferences could be irrational in some sense, but yeah. Yeah, so I guess there's a terminology problem as well. At least we can say that reveal the preferences of people, what people say they prefer is often inconsistent. There was like this example that we gave, all the examples were depending on how you phrase the question people answer differently, even though it's a different one. In irrational preferences, in some cases, you would end up spending a lot of energy to actually not change anything if you have a cycle that you prefer B to A, C to B and A to C. Usually, if you have some irrationality, someone could trick you into proposing these alternatives, for which you would always agree to pay a little bit of money to change and in the end you continue to pay and you turn in a circle. Yeah, so imagine you have a very, very powerful algorithm and it's able to move from A to B, but it has to kill a cat to do, or a kitten to do this, a kitten or a cuter. And the same from B to C and the same from C to A. So the algorithm would just loop between A, B and C and killing a lot of kittens in the process. And that seems undesirable. Yeah, so that's why I am, as we said yesterday, it seems totally fine to model preferences that are perfectly rational and fit a small set of axioms that we want the definition of preferences to follow. And consider that what we observe from humans' behavior is something slightly different than what we would observe from true preferences. Yeah, so I would say like the IAA, so the independent from irrevent alternatives is false for humans, but it shouldn't be false for algorithms. Yeah. Yeah, another thing we discussed yesterday is the noise model. Like the fact that in this, like the way I described the Thurston model for instance, was that there was some added noise to your preferences. So like there's like the true preference plus some noise, but there may be some other kind of noise model we can use. Typically when people are using the internet, sometimes they just misread things or click randomly or arbitrary on some buttons. And so you should expect or you have trolls as well. And so you should expect that there's also a fraction of the data that are just like completely nonsensical. Yeah. And that's not something that's modeled at least in Thurston's model and probably something that we should take care of. But there's a very natural way to do this from robust statistics, which is something we already discussed in this podcast. And robust statistics is basically doing this. Sometimes it's called like adversarial learning or something like this, where you have, when you imagine an adversary, but it's like it's not necessary an actual adversary. It can be just like people doing something wrong. And you get this bad data which we can call adversarial or not, but we have to be robust to this kind of noise. Yes. The specific problem we had with this model is that the noise converts to 0% or 100%. I mean, the priority with which we expect some observation converts to 0 or 100%. But... As the difference between the two options go to... Are different. But we expect like if people use smartphone for example that at least 1% of the time or 2% of the time, they would mistakenly put their finger on the wrong side of the question. So we expect that in a practice, we would not observe from human something that has really converging to 0% probability. Yeah. Yeah, and like the robust statistical framework fits in naturally, especially if you do some gradient descent which is like standard for many algorithms. Like when you do learning, it is often gradient descent, stochastic gradient descent. And you have this step where you take, you have an estimation of the gradient which is an average of all the data. And instead of using the average over all the data here, you can use some robust estimator and that fixes this problem nicely. Another thing we talked, we didn't talk a lot about this, but I think it's like it's very nice. It's actually really the point of the thesis this trade-off between different forms of efficiencies. So there's, you can call it, it's called statistical efficiency in the thesis which is like how well you doing your inference of the parameters or how good the model is essentially, the performance of the model. Of course, this is important. Like you want to actually learn the preferences and not something very different. But in practice, there are other constraints that you have. One is that you have limited computational power. So you also need a pragmatic algorithms that are fast. And this is especially hard because the ideal framework that is presented in the thesis, and it's my favorite one, is like using Bayesian inference. But Bayesian inference for this kind of problem is often very intractable because you quickly have, well, you have to imagine, well, it's a lot of computations. Maybe you can talk about Bayesianism at some point with some people, but yeah, it's not doable. So you need efficient algorithms. And so typically in the thesis, there's a search for algorithms that are going to be quasi-linear in the number of options, for instance. Okay? So quasi-linear, it means that if you have one billion videos, for instance, to rank from top to favorite, from, yeah, I was going to say top from top to favorite, but rather from top to bottom, then if you have to do this ranking, then you want the computational power to be proportional, essentially, to the number of videos you're considering. And this may seem fine at first, like for, but if you think about it, actually, even for videos on YouTube, it's already quite impractical. If you have to do this for every user, for every, well, that's a lot of computations already. Yeah, and in that case, when there are any attempts to be ranked, how many observations from the user do they require? Is it they require N times log N observations? So that's the third kind of efficiency, which is called data efficiency. How many data points do you need? So it's also sometimes called a sample complexity. And yeah, so it's actually part of the thesis. So I discuss what they discuss, so-called active learning. Active learning is when you're optimizing as well, the way you're querying information. So you're choosing, for instance, the comparisons that you want inputs on. And they propose an algorithm that's N log N, so it's quasi-linear in the number of inputs. So it needs N log N data. And if you think of YouTube, it's not doable. You cannot have a... I think that YouTube receives approximately this amount of data. But not from each user. Yeah, from the whole population. From the whole population. You can do a ranking of all YouTube videos according to the whole population, but that's not what really is of greatest interest to YouTube because they want to customize a recommendation. Yeah, specifically in the case of a recommender system, the model described in the thesis, it generalized to more than comparison. When instead of comparing two things, you have a set of small number of things in front of you and you just decide to pick one. So for example, when YouTube proposes to you 10 videos, there is one you choose to watch. And so the model of comparison is also can easily be adapted to this scenario. Yeah. Another thing that comes to my mind here is that if you have a set of videos proposed to you, like the ranking, the order in which they appear is critical. You barely see the last videos, really. So this violates like the IIF, what's really related to the independence of... Well, it's kind of the independence of your element alternatives, but it's like more... I guess you can model this as the noise that I did in your preferences. I don't know, I guess. Yeah. But maybe you should also actually model this rather than just assume that it's some noise. Yeah, I think that's what we did on YouTube. Last few months ago, we... There was a talk from someone from YouTube recommendation and yeah, it told us that it makes a big difference, like a huge... Most people click on the recommendations, but most people click also on the first three or first four items recommended. Yeah. Yeah. Yeah, another thing we talked about yesterday, which is, I guess, more theoretical, but I think it's very important eventually. It's going to be important eventually, is that the other basic framework of these sort of scoring systems of preferences, trying to find a score for every alternative, is like the von Neumann-Morganstein theory. And in particular, there's a theorem by von Neumann-Morganstein, so the von Neumann-Morganstein theorem, that says that every consistent preferences over probabilistic choices, if you consider that there's uncertainty, then you want to be consistent with these probabilities. Then the only way to do this is equivalent, the obturism-orphism, is equivalent to giving scores to the different options. But the scores in this setting have a very precise meaning. Let me, if you have option A, which is as UTT 0 and option B, which has UTT 1, then if you have option C, which has UTT 0.9, it means that you are indifferent, you like as much the case where you have option C with probability 1 and probability A and B with probability 10% and 90%. So it has a very precise meaning, which seems very different from the one that, from the noise model of Thorsten's models and the different models that are studied in the PhD thesis. And this really raises the question, like is there a link between these two, like is it okay to use the scores we learned using the models of the thesis and then do some expectation of the maximization of the UTT as it's done in the von Neumann-Morgenshtern framework? Yeah, so the question, like I haven't really investigated this or maybe there are some published research about this, but I don't know about them. Yeah, I think one of the difficulties is learning these parameters from the von Neumann-Morgenshtern because if people can be inconsistent easily when we ask them simple comparisons, they are also studies that shows that they, when they need to choose their preference based on some probability judgment, they also do it in a very wrong way. A simple example for this is asking people whether they prefer to receive one or five million, receive one million with probability 100% or five million with probability 80%. Here we see most of the people prefer to receive one million with probability 100%, it's completely fine. But then when we ask a second question, which is do you prefer one million with four percent probability or five million with, I don't know, one million with five percent probability or five million with a four percent probability, here people switch their preference. But so this is actually irrational to switch your preference. It means that your preference don't fit with the laws of four priorities. And the reason why is that the second case is exactly the first thing as a first case with the added option that with 95% probability, you will not be playing. So the second case is simply with five percent probability, you would be in the first case and 95% probability, you will do nothing. So if you are rational, you should make the same choice in the first and second case. This is not what we also have from people. Yeah, that's called the alias paradox. And yeah, I guess again it raises the question like, okay, humans are inconsistent with the laws of probabilities, but should algorithms be so as well? Probably not, I'd say. And it's a difficulty. And I guess this leads us to the question like more generally, what are the preferences that we really should implement in the algorithms? Like what should be the objective function of the algorithms? Should it really be our preferences or reveal preferences, I'd say? Well, probably not. I'm going to say it. If there's an international democratic vote tomorrow about, I don't know, climate change or whatever, some controversial topic, like I would not be very confident that the result will be robustly beneficial for mankind. So yeah, it raises the sort of like the limitation of just doing preference learning. Maybe there's something deeper and the other example I always give is like, I have these preferences when I'm on YouTube. That main, like typically when YouTube is suggesting a football video to me, I have a preference to click on this video rather than this robustly beneficial podcast video just below. But I would not necessarily want to have these preferences, like sometimes called a preference of second order or volition. And maybe this volition is, the volition is sometimes described as what you would prefer to prefer if you thought longer, was smarter, spend more time studying and more thoughtful and more caring about the world and stuff like this. Then you would get this volition which is likely to be quite different from the preferences. And this was the question like, when you're doing preference learning, are you learning people's instinctive preferences of the moment, like the system one preferences which we may regret later on? And should we not learn something more fundamental? Because like doing preference learning in the end, if you think about it, that's what YouTube is doing every day. And like we argue, or at least I argue that this is not what YouTube should be doing. So in some sense, I am against doing just preference learning. But then there's this question like, how do you learn something deeper than the preferences? How do you do volition learning? And I think this is like a very, very important question. And like the set of papers that I know that really propose algorithms to do this that I know of is equal to zero, unfortunately. Okay. So I'd say that we really need to think or to reflect on algorithms to do this. Yeah, and I think there are ways to make progress. Thank you. I think it's going to be very hard to identify exactly what we mean by volition. But maybe we should not be too ambitious at least right now, like we should just make step-by-step progress. And one progress I can think of is that instead of just looking at what people click on when they're on YouTube, whether they probably are more or less in a passive mode or in a zombie-like mode, like you just click on the next video. Arguably these are not like the true preferences you have. Like these are results of some kind of addiction or zombie mode. And if you can distinguish this from the something that's small akin to what you would search for instance online or what you would answer when you are queried in a very formal manner, like in a very serious manner. In this setting you have, maybe you express different preferences than the one you express by clicking on videos. Yeah, I think YouTube is already doing something quite similar to what you described. The day after you watch a video, it's sometimes asked very rarely, what do you think of this video after thinking about it for one day, was it a positive experience for your YouTube experience or negative one? Yeah, and another thing is that on YouTube, whenever there's a recommendation, you can actually, there's a button on the top right of the video, you can click on it and there's an option where you can say, I'm not interested in this content. And full acknowledgement like, I actually did this for you for football videos. So the example of football videos of YouTube suggesting football videos to me is a fake news because I don't have any more this kind of recommendations. And arguably this kind of input that I give to YouTube is, should be more like a poet in the learning of the preferences, this corresponds much more likely to my evolution rather than what I click on. And I think YouTube is doing this, as you said. I think YouTube cares a bit about this. But like I feel the academic world is not there. Like we should try to propose algorithms and ideas and probably work with psychologists also to better understand what we want algorithms to be learning from us. Yeah, so one thing we discussed also yesterday is that with the research in psychology we can know systematic ways in which a human reveal preference diverge from their evolution or form some sort of rational preferences. And if we know a lot of, if we understand much better this process from what human truly prefers to what they say they prefer or to what behavior they show, then we can hope to compute the inverse function and from the behavior detect what the other preferences. Yeah, that would be one approach. The other approach we discussed is in terms of counterfactual. So the variation would be, so if I follow what I said earlier like would be what you would prefer if you thought longer or like learn more. And this is like a counterfactual view. So it's a counterfactual thought experiment. And there's research about doing counterfactuals. Arguably if you change things in the latent space of the neural networks or if you, so being a bit technical here, maybe you can have some sort of ways to describe this kind of counterfactuals inside even the neural networks. Maybe there are better structures to do this counterfactual reasoning. But yeah, that would be another way to try to compute something closer to evolution. Definitely research direction that I would want to see more investigated. Yeah, another thing we discussed yesterday is going back to this data and computation efficiency that we discussed. So having quasi-linear algorithm is good, is nice, but actually if you think of complex preferences like for instance, your preferences about whether a given video should be moderated or not, this is extremely hard to describe because the set of options on which you have a preference is exponential, like this is the set of all videos. So the set of all videos is actually very, very large. So it's like combinatorial, which means that it's something like two to the power, something big like two to the power, one billion. And so if you have a linear algorithm on this is completely useless. So you need sublinear algorithms and what's pretty nice in the thesis is that the last chapter seems to have nothing to do with preferences because it's about football predictions. But actually like the two are essentially the same and one thing that they try to do is to, so one of the problems they had in this last chapter is that they wanted to make predictions for football games between national teams. And the thing is that national teams in football don't pay that often, so you don't have a lot of data about them. And arguably you can, there's a sense in which you have sublinear data. There are many more imaginable games than there are data about these games. And so you need a sublinear algorithm. And another thing is that they, what they did is they leverage also the fact that there are a lot of games that are not national teams, that are clubs, but they have the same players as in the national teams. And so they use this to do some inference. And the trick to do this is to use a kernel method. So a kernel method, the idea is that you're going to use the similarity between some data you have and some events you're trying to predict. And if they are very similar, then you're going to say that you're going to make the same prediction as for the data. And the similarity metric that they use is essentially if there's a game between team A and team B, then to predict, if it turns out that the next game is between A prime and B prime, and if it turns out that the players in A and A prime are roughly similar and the players between B and B prime are roughly similar, then you should expect the same kind of results out of these games. So they call it the player kernel for obvious reasons. And this allows generalization using only sublinear data points, which is very nice. This is very similar to how we come at the system, anticipate our preferences and know on which items and video we are going to click, because each video is represented in the latent space of few features, and for each participant, it knows what features they prefer most and they recommend video in line with this. Yeah, yeah, because there's some equivalence, essentially from near equivalence, it's not really an equivalence between feature learning and kernel methods. And so the basic idea, so feature learning is that you have this complex thing. So it can be an image, it can be all the characteristics, you know, of a football game that is going to come. It can be the video you want to know whether you're going to moderate or not. So it's a very complex object. It's very hard to analyze. What we do quite often is we use a neural network to sort of synthesize the information and to represent it and then it's represented as a vector, it's called a vector representation. And then you can say that two complicated objects, so two football games, for instance, will be similar if the vector representation are similar. And it's similar in a very precise sense. Maybe you can take, for instance, the scallop product and the scallop product is large. Then it means that the two are roughly the same. And this is actually very similar to some very basic kernel where you, well, yeah, it's very similar to, and it's way to implement a sort of kernel. Okay, and another thing I guess we discussed by doing so is that once you do so, you can actually, because a lot of the applications I've seen so far, actually all the applications I've seen so far are these kinds of methods. Suppose that there's this feature representation that's done automatically by some neural network. And there's no longer any learning of the neural network. Like you have a fixed neural network, so you have a fixed feature representation or you have a fixed kernel. But what you could do, I guess, is you can actually learn also the feature representation that's going to be able to explain a lot of the way people have different preferences. And so my intuition about this, maybe it's wrong, but the intuition I have and all the people had yesterday is that if you do this for, use the same feature learning for different users that have different preferences, then the vector representations will have in itself encoded, in a sense, like all the different ways people look at the problem, in a sense, the space is going to be stretched according to what people care about, in a sense. And maybe this can be much easier to interpret. Which is something we did not really discuss a lot yesterday, but it's going to be a big problem at some point, like you want to trust that the algorithm that has learned your preferences actually learned your preferences. And this may be very hard. And this could be a way to gain trust in the fact that these models are actually learning or volitions or something like this. Yeah, and I specifically mattered in the case of hate speech moderation. Because some people would want to reject some different kind of speech, like if it's something said about religion, it could not matter at all for some group of people, but be very important for another group. So that's why we expect some latent features in the dataset on which we learn the preferences that would be very not correlated from one user to another. Yeah, yeah. I think this could be very interesting as well to gain insight into not only the algorithm, but also like human preferences in general. Yeah, so I can find this reading very interesting. Like I didn't know that much about any of this before reading this thesis, and I found it very interesting. Yeah, I hope we'll see you next time. Next time, we're going to discuss something that's quite different. Like we're going to discuss autonomous weapons, which are a very big deal in AI safety because they can be very... They can kill. They can kill. Yeah, so I hope we'll see you next time.