 So hello everybody. Welcome to this new podcast. We're going to call it robustly beneficial at least for now. And so what we're going to do is to have a series of podcasts every week or so. We discuss papers that are relevant to this about AI safety in general, like the safety of algorithms, beneficial algorithms, ethics of AI and so on. And I'm with Louis. So maybe you can just present yourself quickly. Hi, so I'm Louis. I'm a PhD student here at EPFL and I have been organizing since six months the AXFT reading group that we have here. And every week we meet read papers and related to AXFT and discuss it. Yeah. And the basic idea here is that we're going to discuss the paper we discussed yesterday. So we're already prepared. We should be. And just to present myself for those who don't know me. So I run a YouTube channel. So I have called ScienceFall in French, where I do a lot of science popularization and particular things around algorithms and artificial intelligence and Bayesianism. And I also work at EPFL as a science communicator, but I also do a bit of research related to AI safety. And the people we're going to be discussing today. I don't remember the name, but it's by Diacopoulos. Yeah. It was an algorithmic accountability. And there was a subtitle something about black boxes. Yeah. So yeah, we put everything in the description. And sorry about this. We'll try to be more prepared in the future. And so that was a like a very interesting paper, which is not from someone who's doing like, I think he's a journalist by training or something like this. I don't know. But clearly he was not someone that makes the makes algorithms. Yeah. But he really studied it. And in the papers, it sounds like he really knows about it, even though he's not someone doing it. Yeah. So when we discuss yesterday, you said that you actually did not enjoy the first part of it, like like there was interaction and then there was a section. Yeah, I found I found the first part less useful to things in terms of AI safety. But still, the one thing that was very useful is this question, which I think is not asked often enough, which is, given an AI system, what are the consequences of this algorithm? Yeah. And usually, for example, if you think of a recommender system, the designer, the engineer building this recommender system would just say, this is up maximizing the likelihood that someone would click on the next video or continue scrolling the Facebook feed. And it stops here. It doesn't ask more questions about the consequences. And the way that they are algorithms are discussed in the paper, they try to understand the consequences of algorithms beyond this, for example, in terms of fairness or in terms of filtering, will there be content that no one is able to see because of the consequences of such algorithm? Yeah. So one thing we discussed about yesterday was the idea of a black box, which was like, it's very common, like we all say, we talk a lot about blocks, like neural networks being black boxes or algorithms being black boxes. But it struck me when I read the paper that it's never really that clear what we mean by black box. Like we usually just mean something that we cannot understand something like this, I guess. And the paper has a more definition, which I found very interesting. Yeah, I agree with you. So the basic definition of the paper was that a black box is something you can only interact with through input and output. And maybe sometimes you don't even know the input. Maybe it's just the output. And I think it's a useful abstraction, like because like when you're dealing with algorithms, and you want to understand them, well, there are different approaches. Like one of the classical approach, if we design simple algorithms is to prove theorems about the algorithm. But usually to do this, you need to know how the algorithm is, what it is doing, especially like the code of the algorithm. But now we have these algorithms that are more complex and neural networks and stuff like this. And sometimes it may not even be the main reason why the algorithm is a black box, as discussed in the paper. Maybe it's just because, well, it's the algorithm of a company, maybe the YouTube recommender system, for instance. And it's just proprietary. And because of this, you cannot do anything about, you cannot know the inside of the algorithm and you only interact with it through input and outputs. And that was very interesting. Yeah, so I also thought it was an interesting idea to have two concepts instead of simply the concept of something that we cannot understand, having the concept of algorithms that are difficult to understand, because for example they are huge, like a neural network with a billion weights and some other algorithms for which we don't have access to the code of the algorithm. It could be a simple algorithm that we have a chance to understand or a very complex algorithm. But then I think it's very interesting because in the world we live in today, where there are algorithms deployed, a lot of these algorithms are actually black box. Black box is given this definition. And it's very interesting to, it could be studied in terms of what are the theoretical limits we have to study a black box algorithm. Yesterday we discussed the fact that there are some theoretical limits. If we assume a limit on the Kolmogorov complexity of the algorithm inside the black box, there are some chances to try to understand this algorithm given a limited number of queries on the input and output. Yeah, so we discussed it, we did not prove it. But I think it's an interesting research direction as well. I guess another thing we discussed a lot is the idea of a probing algorithm. So you have this algorithm that you want to study and you want to, and it's important, like the critical changes I see with this is try to predict what the algorithm is going to be doing in this or that situation. And if you want to know this, then you should analyze this algorithm and to analyze this algorithm, you can try to read it. But sometimes the algorithm, even if it's transparent, it's just too long. And then, especially if it's a black box, you may want to interact a lot with it. And just doing it as a human is not maybe very effective. It can be much more efficient using an algorithm to probe the other, the black box algorithm, I'd say. And what's interesting with the definition of the black box, actually, we also discussed the fact that you can have different definitions of the black box. You can have these input outputs, which you can plug an input and see the output. There would be a not very black box, I'd say, a weak black box or something like this. I played a lot of it names yesterday. And as opposed to something that would be like a strong black box, which would be a black box for which you can only observe the output. Maybe you can also have something intermediary where you know the input but cannot choose it. It would be, I guess, a medium black box. It's interesting. And some things to note about so why this kind of strong black box would exist. It's actually, sometimes an algorithm has been used in the past and it's not being used anymore. So in that case, you see the output of this algorithm from the past, but you cannot decide which input was given to this algorithm. Other reason could simply be that the company using this algorithm also keeps secret what data they use as input. And they tell a story in the paper, which was quite interesting when probing such a black box algorithm. They would find correlations between the age, for example, and the output of the algorithm. But in the end, they learned later that the age was not actually part of the input. So that shows that it can be very hard to infer something on a strong black box algorithm. Yeah. And I guess this also leads to the idea which I find, as a research question, very interesting, is given a model of a black box, like how strong of a black box it is, what is the best probing algorithms you can use. And I think the solution to this really depends on the assumption you make on how much, how black the box is. And I think you can have this research direction where you're trying to design probing algorithms for this kind of black box. Which I don't know a lot about the research in this direction. I know of things about adversarial attacks on neural networks. Sometimes you can. So amusingly, a lot of people say that neural networks are black boxes. It's interesting to ask which sense are they black boxes? Because in the sense we gave, like in the definition we gave, actually a neural network is not a black box, because you can compute gradients from it. So you can observe more than just the output. And actually many people are working on program verification. We had a PhD student yesterday at the discussion who was doing like a verification of neural networks, which typically use information more than just the fact that it is a black box. So yeah, I think it's really interesting to really understand this. And there's probably not enough research in doing this. I feel like it's an open research direction. Yeah. And I think also an interesting answer about this would be that if probing ends up being too difficult, that at least we know that to construct such algorithms, the strategy would be actually not even trying to probe, but fight what is, fight the black box itself and try to discover what's the code inside, etc. Just like if you want to generate adversarial example for a neural network, and if you are only access to input and output, it can be a very difficult work. But in that case, you would make more efforts to actually access the code and the algorithms to solve it. So it would tell us what are the most efficient strategies to turn in the right direction algorithm. Yeah. There was also an interesting point about like in terms of AI safety, so you have this algorithm, should it be transparent? And in a sense, like definitely overall pushing for more transparency of these algorithms, because I think they are difficult to design and you need help and you have to make sure that they don't have vulnerabilities, for instance. But just like for any algorithm, like if you also share the code, you also make yourself more vulnerable to attacks. And that's typically the case for neural networks, where if people can run the neural network locally on their computers, they can compute the gradient. And it's much easier to do adversarial attacks. Yeah. So in that case, I'm very hesitant. I haven't thought enough about it, whether I should be pushy for more transparency in algorithms or not. Because there's all these bad actors that if the algorithm is transparent would somehow completely run the world. Like there was in the paper the example of a spam filter. And we are very lucky that the spam filter algorithm is not easily gained. Otherwise, we would have mailbox full of spam. Yeah. Yeah. So I guess another difficult thing, like I would agree that not everything needs necessarily to be transparent. And probably there are parts of YouTube that you don't want to make transparent. I guess a key challenge is just also, like I'm just thinking it right now, but maybe a research challenge is to determine what are the things that should be made transparent. Like it's a whole question. And there's also a lot of this question about transparency in cryptography, where they have the in cryptography, if you are not transparent, then you're not safe. Yeah. Because you want the system to be transparent and people to try to attack it to prove that actually the system is robust and works. So over the long term, if we are building a robustly beneficial artificial intelligence, they will have to be transparent for sure. You say for sure. Yeah. And then if I'm I really think so. Yeah. Yeah. I think that at least many parts of the code should be more transparent. So typically, in the case of the YouTube algorithm, I'm going to take the case of YouTube algorithm a lot in this podcast, because I think it's a great example. But in the case of the YouTube algorithm, one thing that I would really want to be YouTube to be much more transparent on is what is the objective function of the algorithm? At least a lot more of part of it. Right now, it's like a big secret, big mystery. Like I've talked to many people, three or four people from YouTube, and they always say, well, they might smile something called user engagement, and we cannot tell you what it is. And I think it's not good because the key to being robustly beneficial, I think it's recognized by a lot of people, is alignment. You want to make sure the objective function of the algorithm is aligned with something we would want it to optimize. And I think it's so critical of that, and it's so easy to get it wrong, that in order to be more robustly beneficial, I think you really need to be transparent about the objective function. Yes, we also discussed that when given a black box that we are working on probing, probing this black box, whether we would be able to somehow infer the objective function of the algorithm inside the black box, and we had no idea on how to solve this problem. It's also interesting because we talked about, could you apply this to any algorithm or to humans, for instance? I think there's a much of the black box theory that could be developed applies to humans in a sense. I can mostly interact with you only through input and output. That's not exactly true if I could scan your brain, if I had a magnetic resonance image machine, and I would be allowed to force you to get into this. That would be one way, your brain would not be a fully black box to me, but most of the time, I interact with you like a black box, I can give you input and I look at your output. I think another key question is the question of trust. You want to trust that the algorithm is going to do what you think it should be doing. I think it's still possible to gain quite a lot of trust using this black box interaction. In practice, that's what we're doing for humans, and I'm somewhat quite confident that you are mostly robustly beneficial. But I guess it also depends on people. Maybe the incentives are, I'm thinking for instance, of politicians. Maybe they have so many incentives to behave in a certain way that the input-output model, they know they have secrets to protect, and so the input-out model, maybe they know they have to defend themselves against these probing algorithms. And also talk about maybe black boxes in the future, but maybe even right now, are already trying to defend themselves against the probing algorithms. You think today, already? So I don't think that much today. It's true. Usually, if we would try to probe some algorithms, we would use some bot or crawlers that query the website a thousand times per second. Definitely, most big companies, most websites, they have things against that. We can see this as some part inside the black box defending against probing attacks. Yeah, exactly. Yeah, every capture is in a sense a defense against probing algorithms. So what's interesting, again, the example of YouTube is that there are people who designed algorithms to probing algorithms for the YouTube algorithm. So I'm thinking mostly of Guillaume Cheslot, also Joachim Allgaier, who is very hard. I think it's very interesting because you want to understand YouTube. And here we are not thoughtfully black box model because you can still create your own account and look at what it recommends. And if you click on a video in a sense, it's like querying. What are you going to recommend to me next? So it's a quite strong black box, I would say, because we can create some input, like simulating a fake user playing YouTube. We can simulate some input, but we are not completely sure about what about this fake user is used as input of the algorithm. And we can clearly see the whole output, the output is the recommendation. So that's why I would, I would not say it's a weak black box, but in between weak and strong. Yeah, it's medium like that. And also like it's like one of the difficulty about the YouTube algorithm is that the recommendation is going to do next to you depends actually on probably depends on everything that's on a lot of things that are going on on the YouTube ecosystem. You can only recommend videos that are on YouTube. And if there's a new video uploaded, well, it can now. Yeah, and it will, it will also recommend newest video that it's more, it's more likely that someone clicks on the video from today than the video from last year. Yeah. Yeah, I think in this model, like, so like the the conductor put on my wiki is that and he talks about is that you can learn a black box by interacting with it a number of times that's proportional to the chromagraph complexity of the algorithm, how complex the algorithm is. So if it's one billion lines of code, then with roughly one billion interactions, you can eventually learn what's inside. But that only holds in a very weak like the weak model of a black box. But you have this algorithm and you control every time the input and all of the inputs of the algorithm. But I think if you have another input channel of the algorithm, then this does not hold anymore. Other source of data that adds complexity inside the black box. So in that case, that would be all the other users around YouTube that are constantly giving information, giving extra information for the algorithms to feed on. Yeah, okay, exactly. Yeah. So that adds to the difficulty of analyzing what's going on on YouTube. Yeah, definitely. So yeah, again, in that case, that's a that that was part of the question of how can we make these algorithms robustly beneficial. And for something like YouTube, I don't think I think probing is it's very important. There is not enough done yet. But I also don't think that it's the ultimate solution that would that would completely help us understand the recommender system and and make YouTube make it robustly beneficial. I think we should look into the code, change the objective function. So I think it's interesting because it depends on who you are, right? If you are from if you're not in YouTube and you don't have access to this, then it's a medium black box algorithm, which means that the best you can do is this, right? But if you want to, I guess what we're saying is that it's not sufficient. Like, we think it's not going to get us there. And if you want to make YouTube robustly beneficial, at some point, if you want YouTube wants to make it to help people make it more robustly beneficial, you're going to have to make it not a black box anymore. And typically, it's much less of a black box and typically access to more insight into the code. Yeah, I think there's also something interesting we talked about, which is so in the black box model, like, it's a lot about the interaction with this algorithm. But there's also this problem of priors. You can have strong priors on a black box. Like, if I tell you to write an app to have a base examinations, then even if I don't know nothing, like a purely black box, I don't know any output of the algorithm, any input of the algorithm. Well, because I know you directly and have strong beliefs that you're going to design the app that I have in mind, I have a lot of expectation. I have a strong prior on what the algorithm is actually going to be doing. Okay, but I don't understand in practice how do you get to learn this prior? You are not born with this prior about the algorithm inside the black box. So you mean that without having probed, even before starting probing the black box, you would already have a good idea of what I got on the inside. Oh, okay. I remember we talked about it, yes. Another example to be more clear that makes me understand is that even though we don't know what's the objective function, exactly what is the objective function of some recommender system, we have a very strong idea of what it is. For example, we clearly know that it's not counting the number of clouds in the sky. We know it's most likely counting things like a number of clicks, a number of minutes spent on the website, a number of likes and shares, etc. That's the case where even before probing, we have a lot of information about the inside of the black box. I think this information is also useful in the analysis you want to make. There's been Bayesian here. But if you want to understand an algorithm, this is part of it, like talking to the YouTube engineers is an indirect way of probing the algorithm in a sense that they have designed. I'm just thinking also about another thing is that in a sense, the YouTube algorithm is actually darker than we said because we don't really observe all of its outputs and sometimes we barely can analyze. You can prove it by having an account and looking at the the recommendation it gives you. In the sense, it's medium black box. That's what I'm saying. There's another interesting work here by people at IPFL about trying to understand radicalization on YouTube. It's a paper where they suggest strongly at the end that the YouTube algorithm is recommending more and more radicalized videos to users. But the way they probed it was not through basic input-output interaction with the algorithm. Instead, what they looked is that the user's comments to the different videos and what they observed is that there was a pipeline, like there were lots of users that would be commenting on videos that are more and more radicalized. Okay, so they can follow the sequence of videos that some user watched through the comments that are posted. I guess this is a very indirect way of probing the algorithm. Okay, I see. There could be some confounding factors. For example, something else in the world makes people more radicalized, like Trump, let's say, and then they observe this on YouTube. I don't know if there are other observations that indicate that it's the role of you, that is actually a consequence of the YouTube algorithm. It's a good point, for sure. At some point, we should probably invite Manuel into the podcast. Yeah, it's a good point. Another thing that I haven't read the paper yet, I should, but I'll just keep through it. But another point that was made, I don't remember where, is that the videos themselves were not, like the YouTuber was not recommending more radicalized videos. And so this suggests that the way people switched to more radicalized videos were through non-human recommendation means. It was not the YouTuber who said, yeah, I should go to this guy. I guess nobody is using this anymore. But I don't even know if it's still a thing. But back in the days, at least on the YouTube page of a channel, you could suggest related YouTube channels. And it's like the creator's choice. So that would be like the creator recommending videos. I don't know if there is this anymore still. I'm not sure. I don't think, most people are probably not using it. Most people just follow recommendations like 70% of the views on YouTube are results of recommendation of the algorithm. And what they suggest is that it's not like humans who are trying to push humans to be more and more radicalized. It has to be something else. But as you say, it can be something outside of YouTube. And these kind of results also were found on other platforms like Twitter, I think that people react more to more polarized comments. And because this algorithm uses an objective engagement and a reaction of people, definitely it's a clear cut consequence that if we want to maximize the engagement, we would show more polarized content. Yeah, just to get back again to probing algorithm, I think there's really like, so I've read a lot of papers on polarization recently. And like many of the papers are trying to do some probing in the sense of what's going on, maybe not of the algorithms itself, but maybe of the whole ecosystem, but sometimes of the algorithm itself. And like people like Manuel or people like Joachim, Al Gayer, or Guillaume Chasseau, I feel like they're all trying to use tricks and that's really nice. But I feel that maybe there's some overall arching theory to be made, like how should you probe an algorithm overall and to have something more like more principled to give people ideas of how to do it. Like the paper does it quite a lot actually, it gives a lot of examples of how to do it. But yeah, I feel like maybe there needs to be more push in this direction. Yeah, I see. But okay, just a thought is that I'm a bit afraid that what would be a theoretical result about this might not be useful in practice. So for example, if you say pick random points in the space of inputs, then we don't know how to create that input for your computer system. We don't have the fact that we are forced to interact through the YouTube's API or YouTube user interface to be able to probe the algorithm. We are very restricted in the way we can probe these algorithms. Other discussion in the paper that were quite interesting was on the side of the law. And so he describes examples where legally it was actually forbidden for someone and people with had a lot of trouble after working on probing some algorithms or some database to know data that's out there. And for example, I don't know if we try to probe YouTube's algorithm where will we get a lawsuit because we are trying to steal Google's secret out there. And there is also the possibility that the law forces that algorithms could be more transparent and it could be in some specific fashion. I don't know if GDPR, does it ask for this kind of thing? So there's a claim about the transparency of how the data was used. But I feel like it's very subject to interpretation in practice. Is it at least able to ask for all the data about us? I'm not sure it's exactly that. It's more like how the data were used or something like this. We have to get some legal expert. Like we also talked about the case of the geisha app algorithm, which is an interesting example because I don't really know how it was in the US. But it could be a fully transparent algorithm because it's a very nice algorithm. So it's the algorithm used to assign people to for instance like students to universities. Students have different preferences. Universities may express also different preferences about the students. And then you have this matching algorithm that also has nice property. And it's very transparent. And it's like more than transparent. It has additional properties than just being transparent. Namely, we can prove few M's about this. That's very nice. You know that it's going to lead to a stable matching. You know also that there are incentives for the different, for one side for the students if they're designed to be too full in what they're saying. So annoyingly, in France, when it was applied, the code was not actually transparent. And back when I was actually trying to probe this algorithm, it was not even black box. It was very black box to me because I cannot give input and see outputs. And I could not have indirect information about it. I was preparing a video on this topic and I was pretty sure it was using geisha play. But I could not verify it. There was no resources that clearly stated we use geisha play algorithm. And there was this story about high school students were complaining about this. And at some point the government was just not releasing the code. And at some point they sent the code to the high school student association and they sent it by mail, by physical mail. And the code was horrible. It was a very bad code. And I think it's a case but it's really a shame. You could have a very nice transparent algorithm well designed and certified by different experts. I don't know. But maybe there's not enough of a reflex to try to make code transparent in general. For sure in many companies because but also in governments. The open source movement is growing I guess. But for this kind of question I'm still worried about a general understanding of what the algorithm would do. I don't think it's the reach of a large proportion of the population to understand some stable matching algorithm on graphs. So I think it's a completely different question how it would be perceived. And whether the situation where humans look at the rankings and decide or these algorithms that a lot of people won't understand. So I guess most of it is indirect problem I guess. Like you just ask an expert's opinion and if enough experts that seem independent enough and say it's good when you start to trust it. Yeah I guess trust works on like you don't trust directly the algorithm. Yeah it's interesting I find the probing algorithms for building trust. It's a good point. Yeah. Because the paper talks mostly about probing algorithms through direct means like interacting or analyzing the outputs of the algorithms. But they're also like important in direct means through probing algorithms. Especially if you don't want to spend time or you're not skilled enough to understand an algorithm. Yes. You can ask people who may know better. And yeah there's also the question of trust in experts. Okay cool. Well thanks for watching and I hope you'll be here. Next time we're going to talk about we should be talking about robustness in high dimensional statistics. The series of recent papers about this which is a very interesting and important topic for residents to address.