 Hello everyone. Today we will discuss a big collaboration called Trostwerty AI Developments, Mechanisms for Supporting Verifiable Claims. So it's a collaboration led by Miles Brundage, Shahar Avine, Jasmin Wang, Haydn Bedfil and Kretschen Kruger, with a couple of other researchers from OpenAI, Montreal, McGill, EPFL, Ecole Normale Supérieure and Google Brain, etc. And so we will follow the structure of the paper more or less in this discussion, where we first will discuss the institutional mechanisms for Trostwerty AI development, then we'll move to software mechanisms, and finally hardware mechanisms and specifications. Who wants to start? Maybe Le or Louis on the institutional mechanism? Yes, one of the important problems is of course how to verify and audit people who are designing AIs and how to make sure that these AIs are constructed with the right interpretability, privacy, no bias conditions and so on. And in order to do this, the first section of the paper focuses on institutional mechanisms and what kinds of structures should be put in place. And the first great point is the third-party auditing program. So this is what happens in the banking industry. You have this auditing made by outside companies that can be governmental. And maybe for AI, even probably surely, we need some external auditing of some sorts to understand, to better track what the algorithms are well designed. And if they verify the specific conditions, we want them to verify. Yeah, if I have more comments on this. The problem is that of course, as we already discussed, this raises a lot of different problems in practice. I guess there's some legal problem. This probably needs to be legalized for this to be done consistently. It's hard to think of other mechanisms through which this can be implemented for actual companies these days. Another problem is that there's going to be a need of a collaboration of the other company to make available what it is that needs to be audited. Actually, there's even the question of what it is that we're auditing. Do we audit the data? Do we audit the learning algorithms? Do we audit the algorithms that have learned the objective function of the algorithm to steal another thing and arguably the most important to us. And so far, I don't think it would be very easy at all. I think it would be a huge enterprise, even for, especially for big companies like Google. They probably have very intertwined code and not ready to be audited. But this could be also interesting to push them in this direction, because it would maybe force them to have a more structured code. Well, I don't know, but they could maybe it's already very well structured. But if you're trying to make transparency, you have an additional incentive to make your code documented and clean. So, yeah, I think everybody agrees that there needs to be some sort of better auditing like this, to which extent and how the devils in the details, I guess. Yeah, something to add about third party editing. Otherwise, you can explain the next one. So, yes, auditing the auditors. That's in the fifth point. That's what he discussed at the end, I think. The second recommendation of the program to work on at the level of institutions is recommending more routine exercises. So, routine exercises consist in trying to attack the system that you are designing yourself to test for failures, to test for lack of robustness resilience. And so, the author recommended that developers of AI would learn better skills to perform these kind of exercises and to be able to catch before it happens the possible problem that would arise with their software. One thing we can think of is, for example, last time we discussed about DP3T. And the authors writing the framework of DP3T, they really made the extra effort to think of how different agents, different stakeholders, could try to attack and break the system. And this is extremely important to ask this kind of questions because otherwise you couldn't end up in a case where your system doesn't work and you did not anticipate how it would go. And the author found that this sort of skill, like threat modeling, is lacking within AI developers and should be more motivated. And this is also something we can recommend to AI developers not doing only research but working in large companies. And one thing we, a second example that we often talk about is simply the effect of maximizing engagement, which led to a lot of biases and polarization on social medias. This is also a kind of flow that AI developers should try to see how people will react to their system and anticipate the worst cases that could happen. So yeah, I don't know how much we can stay on this point, but I wanted to comment a bit on the interpretability part of the, so the report has the usual calls for interpretable, transparent algorithms. I know that Le has a lot of caveats to add to interpretability. Well, yeah, it's just that it's ill-defined. And so as I like to say, interpretability requires interpretation. And a sense in which the algorithms could be transparent is if its code is open, but if it's a neural network, its code is like the weights of the different snapsys. And actually, you want to learn, well, you will learn a few things by having this data, but it's not explaining a lot of the algorithm. So there's actually research to be done about what kind of interpretations of the algorithms can be done. I think it's also very tied with human psychology, like what it is that we find will increase trustworthiness in the algorithm in terms of data interpretation. The risk with interpretability, so calls for interpretability are fair. And I think we should always have some level of interpretability. There's also evidence, research evidence that adding interpretability constraints improves, actually improves the algorithm. You can see it as a sort of regularization. So if you constrain the algorithm to be in addition to be accurate, to be interpretable, so naturally you would force the training to select less complex models and regularize somehow by interpretability. You can see it as some sort of sophisticated Occam's razor. So it has benefits. There's empirical evidence for the benefits of having interpretability constraints, besides interpretability itself. Now the KBS is that there are risks of making up a story or of cherry picking one explanation of the algorithm when you start having interpretable or trying to make it interpretable. So if you say, you take a neural network as an algorithm, so obviously this is the algorithm you deploy. You deploy, so most like many times people, reviewers especially, they don't think of neural networks as algorithms. They talk about the model, the model, the weights, the parameter vector, but they forget and or sometimes it happens to us that we tend to forget that at the end of the day this parameter vector would be deployed itself as an algorithm. So we think of, let's say, the training algorithm as the algorithm, so SGD, gradient descent as the algorithm. So that's the algorithm we use to train the model or the parameter vector. But once we're done training, we deploy something called the parameter vector or the model. And we tend to think of it as a model or like a set of parameters and not as an algorithm. It is actually, it is an algorithm you execute it. So it tells you by how much you multiply this and by how much you multiply this and how much you add this with this. And at the end, this is how you decide that the photo contains a prisoner or someone who is not likely to be a prisoner. So you take an obscure example from judicial deployments. And if you sometimes, if you start cherry picking explanations, say, oh, it's predicts that this person is likely to be a prisoner because it wears this thing that has been observed in all the groups of these bands, this criminal band, or it has this tattoo, the person houses, she or he has this tattoo. And we know that the tattoo is present in all the members of this criminal group. Fine. But you might start cherry picking, imagine a lot of pathways in the neural networks, you might start cherry picking pathways to make up an explanation. Yeah. So it's like, I think we should think of these algorithms, especially large neural networks as complex systems. And when you're analyzing a complex system, especially if you have motivated reasoning, there's a risk of, oh, yeah. Maybe I can comment more on that because I mentioned the word pathway. And so this kind of objects, a graph with a lot of nodes and a lot of paths. So you can see a neural network as a graph. And then you have a lot of neurons and then paths. So you have features. This is the moustache feature, the trouser feature, et cetera. And then you have pathways between them and the output. We have the same thing in biology in metabolic networks. So you have a lot of a graph of reactions between metabolites. You have this gene and this piece of RNA is a catalyzer for this protein and this protein is, so it expresses this protein and this protein is internal catalyzer of this reaction. And for the early days of bioinformatics, once we started sequencing the genome, you had a lot of speculative interpretation where people would start talking about, oh, this is the gene of, to cite a recent controversy, this is the gene of homosexuality. We now know that this is not such a thing as a gene of, or like this is the gene of this trait. And now we know that it's not about this piece or that piece or this pathway. It's like, it's way more complex. And if you start analyzing a complex system, a complex graph like this, you, the probability that you are just cherry picking something is close to one. So you shouldn't absolutely try to explain a complex system by just like taking a small pathway and say, okay, look, there's this tattoo and because the trouser is blue, then therefore this is the prisoner. Yeah, it's a, even more general question about epistemology or what it is that we mean when we mean an explanation. And it's a complicated question. I think the philosophers debate this. So, yeah, interpretability really needs interpretation and it's a hopefully better interpretation rather than interpretation. Another point I want to mention about interpretability is that there's the question of interpretable to whom because some explanations are perfectly valid for computer scientists. So if you tell me that a university assignments a system where students are assigned to different universities, if it follows the game shape algorithm, that's fine for me. I can understand. But if you tell that to the general public or just non computer scientists, you know, yeah, it's no longer clear that that interpretable, even though it's not a very complicated algorithm. And when you're talking about DP3T or like these algorithms that are slightly more complicated than the game shape algorithm. Yeah, the question of interpretable to whom is really becomes a big problem. Like he can spend hours and hours explaining the algorithm. Maybe you actually need hours and hours to explain the algorithm to the layman. But you also have to have his attention for hours and hours and usually you don't. So in the end, these algorithms are somewhat opaque to people who don't spend enough time thinking about them. Yeah, the author of the paper mentioned some research done with data scientists or computer scientists where explanation were generated and they were asked to actually interpret the explanation to somehow these social studies to explain, to better understand what are good interpretability methods. But overall, they recommend more work on more research work on interpretability because as of today, there is no frame or clear definition of what interpretability means. Do you think it's an important work for AI safety landscape? So in a sense, this is kind of the job of science communication, just explaining explanation is also the job of science communicators. And I spend a lot of time trying to do this. And it's just very, very hard. And the more complex things are to very complex algorithms. But if I struggle to understand it, we have an even harder time to explain it. And yeah, it's just very, very hard. And there's no simple path. There's no clear guideline about like, this is how you should explain this concept in science. When concepts are completed, there are multiple ways to go, you actually need multiple ways to go. And in the end, even if you take the best science proposal out there, it's still possible that they will convey more misinformation than actual information, even though everything they're saying is right. Because it's very hard, like people misinterpret things and it's all like just like very simple steps, like the Newton's law, F equals MA, even this is very hard to get through. You really have this inner feeling that, yeah, it's acceleration that's affected by forces, not speed or velocity. Even this is very hard to communicate. So yeah, try to explain the neural networks and how we can go wrong and it gets very complicated. All right. This was actually one of the points they make in the second section, which is discussing software mechanisms to improve trustworthiness in algorithms in general. So to go back on the first point, which were institutional mechanisms, two other ideas that we did not mention yet is the first one giving bounties for finding bias and safety issues in algorithms. This is something that is already done a lot for usual programming bugs in software. But it's not done for an algorithm that starts to recommend very problematic content or the machine learning algorithms. This thing is not well spread yet and they hope that by putting in place some sort of bounties, it will motivate the general public that is knowledgeable and machine learning to look for these bugs and report them. So one thing we can think of today is that if someone finds a bug on the YouTube recommender system, there is no reward into simply revealing this bug. But what someone of today could do is simply use this to generate ad revenue instead of telling me to Google and helping to fix it. So that's why this kind of bounties can be useful. Yeah. So as the software gets more and more complicated, the testing of the software gets usually overwhelmingly more costly than the production or the development, the initial development of the software. And yeah, there needs to be a lot of testing, like huge incentives, financial incentives, social incentives as well. This works a lot in the open source community to find bugs and create bugs. And yeah, this is an important task. And this is all the more important for machine learning algorithms that have been shown to have a lot of hidden vulnerabilities. Well, maybe he knows a lot about this. It's his base. I was saying in the discussion that I find more reasons to have both bulky programs in machine learning than in traditional software. For traditional software, there is basically a piece of code that is way more interpretable because it was written by hand or by many hands. So you could imagine a setting where many hands were reviewed as code and audited. And sometimes for the client, it's safer to do this auditing internally and not leave it in the public domain for bug bounty. Of course, in bug bounty program, you don't share the code necessarily. But for example, you can do something like penetration testing. So I don't share the code, but I can invite you to try to penetrate my website. And then if you find a vulnerability that allows you to penetrate the website, then you can win the bounty and especially if you can explain why and help you fix the vulnerability. So there are reasons. So in case of websites, obviously the website is already there. So there is incentives to have bug bounty programs on, for example, penetration testing. So according to a friend who is a penetration tester for a company that deals with banks, like this is one of the largest markets they have is just penetration testing of online banking websites. And then the thing is already there. It's already public. It's already there. You can make it a bug bounty program. For most of traditional software, you don't want to expose it to external reviewing. So there are incentives for clients not to go for bug bounty programs and keep all this in internal. Now, for machine learning, there is one thing that makes, that is very different from business software is that you have the software, so say the weights of the neural network. And then you don't want, you don't know how it behaves on every data point. You only test it on, you are limited by how much data you have. So if someone outside the outside world has some set of data set or distribution of data that you never have accessed, then you would be happy to audit your software with their data set. So for example, they can, they maybe imagine like you trained your, you trained your neural network on data only coming from American cities. And then someone has a data sets of people living in India or in Japan. And they can, they can, they can test it on your software and say, okay, look, you have a bias against this or this fake feature, but it's not very present in the American data set. So this is, this is, this is something where, where, where you can see beneficial, beneficial outsourcing of auditing. And especially like, for example, auditing for bias, auditing for bias, or even for inaccuracies that, that people do not necessarily call bias. So you just for like, for even for like robustness and, and, and accuracy of the algorithm, you can, you can see a lot of reasons to, to adopt both multi programs for machine learning. But yeah, in terms of vulnerability, as you said, and in general, like, there are like, several other reasons why machine learning should probably embrace but multi spirit is that the space of vulnerability is extremely high. We deal with very high dimensional spaces, data and of models. And you can just, you can just do it the, the, all the way or like just audit things and start reviewing them, etc. So maybe like crowd sourcing audits or multi programs could be, could be, could be, could be efficient. I think related to this is the fact that machine learning, especially neural networks or anything that's a gradient based. So it's not very necessarily neural networks, but I think, I think that's gradient based is more vulnerable if the attacker knows the code or the weights in particular, because then he can compute the gradient and, and he can have this so-called evasion attacks. So transparency opens the vulnerabilities. And so there's, yeah, there's a, it's not clear that what should be really transparent and what should not be transparent for safety reasons. And I'm just not thinking about interglow city here. So this should be also, I guess, something to be, yeah, better understood and see what is the optimal tradeoff here between transparent like security. Like, I think the overall goal here is basically if it's a security and address worthiness, like, it's not clear that a transparent differentiable model should be made transparent. Yeah, we discussed this topic in the first episode of the podcast. Discussing accountability of the black boxes. Yeah, another thing we did not discuss earlier is the, having the, like the data or the logs of what has been done by the algorithm can still be extremely worthwhile, especially if you think in terms of the, the YouTube newcomer system. Understanding what is recommended, for instance, by the platform right now is impossible for outsiders. It's very hard for outsiders. I can need to do some, a lot of tricks like Guillaume Chesleau or, oh, you are gay. Yeah, it's not easy. And so if there were more, like collaboration between YouTube and research entities, and because there's also this GDPR problem, because the data is very hard to make them both useful and differentially, like really private. We talked about the example of Netflix, the Netflix prize, where Netflix gave data, made public some data and anonymized it, but like it was de-anonymized afterwards because you could relate this to, you could connect this with what people commented on some other website, like IMDB. And so just this thought about how to better understand what is done by the YouTube recommender and how to allow only researchers, for instance, and not the general public to export this data, to better understand and better audit and better, and improve the algorithm if something is very bad. Yeah, this should be thought carefully. It's not easy, but this sounds like something quite desirable. As opposed to today, some researchers, what they do is what they try to find some API. It's much harder to get this ribbon data, for instance, to understand the radicalization paths on YouTube as has been done by one of our colleagues at DBFL. What you mentioned is one of the recommendations about software mechanism, leaving audit trails. This is the term they use. So one thing they recommend is also to do research on what kind of logs would actually be more meaningful. I guess there are many ways that algorithms can keep track of what it is doing and if there are some that are more meaningful for auditing later on. One reason it's really related to trust is this comparison they do with how airplanes are working. In an airplane, there is a flight recorder that is keeping track of everything that is happening in the airplane, of the mini-metrics. These are a part of why we really trust planes. Thanks to these flight recorders, for the few accidents that planes have had, we could get back to the source of the accident and fix them. I think a plane would be a lot less trustworthy if when there was an accident, we could just say we don't know what happened and stop there. This would be a very charitable system. These logs definitely exist in YouTube or Facebook. Yeah, maybe part of these logs will also be true because the logs of YouTube and Facebook are definitely absolutely huge. Maybe for the planes, they measure some other kind of metrics and I think we could come up with some metrics that make the log more easily understandable. So for example, YouTube could log every video that is recommending, but it could also log the ways with which it takes decision over time. For example, if it has a vectorized parameterization of every video and we know that these parameters depend on some dimensions, like what topic the video is, how long the video is, etc., logging this kind of metric could be better to simply audit the algorithms and simply logs of what the algorithm did. Okay, okay, should we move to hardware or do we want to still discuss something? Okay, let's go for hardware. So one of the points they raise in the article, which is a very natural comment to raise, is that you have to make sure that your hardware is secure in the sense it is running what it is supposed to run and only what it is supposed to run. We briefly mentioned that in our previous episode where we mentioned maybe something that is not present in this report, which is the very low-level vulnerabilities you can have in the CPU. But again, when discussing the paper of Trust 40 AI, we more or less all agreed that the threat model for most machine learning, there is so much threat on the high-level software part that this is probably not one of the top priority in terms of risk. For now, I don't know what you recall the discussion on that. And this is also the, this was a topic of your PhD that even though one third or 20% of the hardware that you were using got hacked and start behaving in a different way, still implementing systems that are still robust and resilient to this kind of situation. Yeah. Yeah, so I think we can think of different attacks on hardware. Like the worst kind of attack would be a backdoor installed by the designer, as we discussed. But yeah, this is very hard to defend against. And maybe there are other priorities. For a field that is still struggling with the average and the median, I think this is not yet. The most likely cause for vulnerability. We still have way more obvious vulnerabilities in machine learning. Yeah. Yeah. And then there are other attacks, maybe some part of the hard work can be hacked and well, even this is not the most likely threat model, let's say. But one that's much more reasonable is that part of it crash. And this happens probably all the time and especially if you have bigger and bigger systems. But yeah, you can also mitigate this with the research as done by MIDI. So nevertheless, one of the topic they mentioned is that something we can do to make hardware more secure is they call it... Trusted Execution Environment? Yeah, exactly. So in this Trusted Execution Environment, it's well known how to make them and use them for usual hardware, like CPUs that we use all the time. But machine learning uses more and more specialized hardware. And we can expect that in the future, there will be new specialized hardware designed specifically for machine learning and artificial intelligence systems. And the problem is that creating this secure environment incur a cost. And what they think will happen is that this cost is... As we update the software we use, this cost will need to be paid every time to create secure environment on this new type of software. So yeah, the authors recommend that we should work towards building this kind of secure environment on the hardware. But they realize that it's also a costly thing to do specifically for AR, which updates its hardware often. Yeah. So one of the people who had our discussion shared a couple of papers that show that there is ongoing research in designing, in particular GPU, so graphical processing units that are used to parallelize the computation in neural networks learning. This is research into making this more into a trusted execution environment. Yeah. Another thing that was discussed in this section was providing academics with more hardware. So typically, to test for the idea is like to test the YouTube algorithm for instance, all very big things like this, you're going to need to test through a lot of angles and this requires a lot of computational power. Also, like if you want to just keep up with the latest developments in AI, like these days, the big papers, things by open AI for instance, represent, I don't know, was it thousands, hundreds of thousands, maybe millions of dollars? There's maybe a connection to be made with... So recently there are calls for reproducible research in NL and we see in the submission platforms of NRIPS and ICML which are the two major venues for machine learning. But you have no incentives to explain how reproducible is your results and your code, et cetera, and you have to provide some helpful information for people who would like to produce your work. Two years ago, one year ago, an effort led by Joel Pinot on reproducibility was to take codes or papers, codes and et cetera from these papers and use it as students' projects where they would verify the reproducibility of the work. And within this discussion, so many people, a member suggested that, so like I remember, like when Joel Pinot was giving this talk on reproducibility at EPFL, I asked her, why want you, so if you want to have some built-in reproducibility guidelines, checkpoints in the submission and also for the reviewers to check the reproducibility, why not provide computing power to academics who would like to verify the reproducibility, especially that like we have some papers, of course, this is not yet the average paper but some exceptions. Recently there was a paper from, I think DeepMind and Ben Resht was ranting and explaining that to verify these experiments, you need one million dollars of money and you can't expect academics to verify claims or results. So one thing you can imagine is that you could impose or incentivize companies to provide computing power for academics when reviewing their work. But again, I don't know what's the cost benefits for this, this could be an idea. Yeah, but it's not clear, how are you going to need some legal thing? I think legal and the financial parts, there is also the career incentive part. I would see any academic incentivized by their employer to spend their time verifying the reproducibility. We are not even incentivized to verify the reproducibility of other academics. So I don't see a word we would be incentivized to either produce the company's work. It's not even our own work. So yeah, I think beyond the financial and legal parts, I think the real bottleneck is academic incentives and career incentives. That would block any hope for this part. Yeah, yeah, there's like this reproducibility concern but I guess there's also a competitiveness concern and also a concern for the ability to audit large scale systems. Yeah, and just to get this computing power, like it's going to be costly at some point. But yeah, it will be like a lot more costly than because like in computer science, most of the cost of an academic is I'm guessing still like wages. And as opposed to some other fields where like in biology, they have these very expensive equipments. Like compared to this, I don't know how computing power would compare to a new MRI machine or stuff like this. I don't know. I think it's about equal. I know some colleagues at PFL who spend about the same number of CHF in computing power as they are on the salary. I expect that for people working with an even bigger system than the price of the salary doesn't matter compared to the price of the hardware they use. Yeah. Okay. Do you want to talk about a specification? Yeah. So auditing auditors. Oh, you can talk about. So yeah, now I guess we're going to talk about things that are a bit beyond the scope of the paper. So one thing we discussed is the problem of auditing auditors. And I think it's neglected the challenge because it's, I don't think it's very easy at all, like to understand like if auditors are doing what they're supposed to do, but also if they are actually auditing the things that they should be auditing. So we're talking about trust. There are two issues. One is like, is the person willing to do what we would want him to be doing? And the other question is, is he competent enough to do the things that we want him to be doing? And I think both should be evaluated at some point. And yeah, so to understand if auditors are doing the right job, like we ask a few questions about the banking industry or the restaurant like health care industry about the standards of restaurants or hygiene in restaurants. And it's not clear that auditors are doing actually audit as they should be. There's this importance of having a good relationship with the person you're auditing. And it's actually like an instrumental to do good auditing. Like you want to create this trust between the auditors and the oddity. And because of this, then you have a lot of sorts of human challenges, I guess. And in the case of algorithms and additional time, is that you can actually wonder if what the auditors are searching for is actually what's most relevant. For instance, to take a very concrete example, for contact tracing, there's a lot of very diverging opinions. And for instance, what WTO is kind of recommending is not necessarily aligned with what the Human Rights Council is saying about this technology. There's a tension here. And it can be very, very complex if you move to a system like the YouTube algorithm. Like what should it recommend? Like for instance, or for LinkedIn, for instance, should it recommend as many times this kind of job to men and women, even though in this kind of field, like in, for instance, in programming, it turns out that there are a lot more men than women. So does this mean, well, yeah, just the tension in fairness, well-known between individual fairness and group fairness. Yeah, it's not that easy to know what should be done. And this, I guess, leads me to the other point, which is the point of specifying what it is that we want for these algorithms. And I fear it may be neglected by, at least for the paper that's not talked about this, and like I can understand that it has had other priorities, but I feel personally that specification is a huge part, like is a critical part of building trustworthy algorithms. Like if you want to have a collaborator for a given project, you want to, like if you want to trust him, like you want to make sure that his incentives are somewhat aligned with what you want to do. I think it's part of trustworthiness. And if he has hidden motives, then it's not going to be good for you. Then so the answer to this is that so that you trust your collaborator and instead of giving her or him, so yeah, instead of giving her or him, like the mindset of the company or the spirit of your intentions, you start giving them more and more detailed to-do lists of tasks. And eventually this becomes a very costly to do. And you'd always look for aligned collaborators, not for collaborators for which you spend time. For each of you actually, you and him or her, spending a lot of time just specifying tasks and exact intentions. Yeah. I feel like the people may have started with the assumption that the specifications are unknown or not too hard to specify. And then it's mostly a matter of auditing and verifying that they are satisfied. But the specifications are a huge mess. If you think about concrete problems, like even like contact tracing, but especially if you move to recommender systems, just specifying what it is that we want these things to be doing is huge. And any specification is going to be extremely complex. Just writing down the specification is probably going to be impossible. And even if you write it down, if you write an algorithm, probably there's going to be specification gaming. There's a recent DeepMinder paper about this. So it's also called a reward hacking. So if your specifications are somewhat what you want, but not exactly what you want, then the auditory can fully satisfy all of them, all of your specifications, and turn out to do something that it is not what you wanted to be doing. So I feel like thinking about specification in general is neglected. Okay. Okay. Good for today. Yes. Cool. Thank you. And so next week, what will we discuss? So thanks for listening to the podcast. Next week, we will discuss the paper titled Closing the AI Accountability Gap, defining an end-to-end framework for internal algorithmic auditing. Because today we talked a lot about auditing, but it's still unclear what auditing actually means. And so that's why I'm looking forward to this interesting discussion on how to better audit algorithms and maybe do it with the right specifications. Bye. Bye.