 Hello everybody, in this session we'll discuss the complexity of agreement. So agreement is a problem which is tackled by many communities in social choice theory distributed computing, theoretical computer science, and in this case it's a theoretical computer science approach on Bayesian agreement. So based the discussion on a paper published in the Symposium of the Theory of Computing in 2005 by Scott Aronson called the Complexity of Agreement. And it tackles the question of two Bayesian agents trying to agree and then you can compute how many bits of information they could share and also to reach an agreement and other problems of complexity theory in terms of either communication or time. So the basic idea is that the two Bayesians initially have the same prior, so maybe they're born with the same prior. And then they collect very different data like imagine that one Bayesian is going to live his life in one way, like the other Bayesian is going to live her life in a very different way. So they collect very different data and because of this they have different posterity. So if you ask them for instance what is the probability that France will win the next World Cup, then maybe they will disagree because they have access to different data. So they said different probabilities like maybe one is going to say you're 10% and the other is going to say you're 50%. So maybe just so that people who are not familiar with Bayesian language follow with us like some ways it's coming from another community. What do you mean by a prior? So when we say this Bayesian, she has a prior on the France winning the World Cup or she has a prior and then she changed her prior and then she has a posterior. So maybe like just like introduce this language to you. Yeah, sorry about this. So when you assume that an agent is Bayesian, so an agent is Bayesian if he or she follows the laws of probability to determine what to think. So it's like a fully probabilistic agent like we named Bayesian as probabilistic. And to apply the laws of probability, if you know what you should think after observing some data, you need to apply this equation called Bayes rule. And to apply Bayes rule, you need this prior. So the prior is what you would think before looking at the data. So it's really initially like the two agents agreed that France had like a 10% probability for instance of winning the next World Cup, like the World Cup in 2022, let's say. And then as they get, they collect more and more data for each piece of data essentially they're going to apply Bayes rule. So Bayes rule is going to, so it's an equation and it says how to compute what you should believe just after you've seen the data. And so this posterior distribution is like an update, it's a change compared to what you used to believe before looking at the data. So there's this like update rule called the Bayesian inference that's going to improve your belief in a sense now that you know the data. And so you can imagine that if, so let's call the two agents Alice and Bob. So if Alice sees no data or very few data or data that are unrelated to football, then maybe she's not going to change her prior a lot. And so after years and years of learning about mathematics and school stuff but not looking at football, she will conclude that France has a 10% probability of winning the next World Cup because that's what she used to believe. Maybe on the opposite side, like Bob has lived a very different life. Maybe he's followed football carefully or maybe not. And he's seen data about this football player called Mbappe and he's become very good. And he sees that Mbappe is improving year after year. And so now given that the data that he has seen, he's going to change his belief and he's going to say, well, actually now I believe that France has maybe a 50% probability of winning the next World Cup. He doesn't really decide to change, it's more like the laws of probabilities by applying the laws of probabilities, he comes with this conclusion. And so now imagine that Alice and Bob meet one another again. And now they are in disagreement, not because they were thinking in a bad way, but just because they were exposed to different kinds of data. And the question of Arman, actually a question that came back to Arman, I think it was 1981, I'm not sure, in the 80s I think. 76. 76 in the 70s. So Arman asked the question, will Alice and Bob agree if they get to communicate and if they try to agree on the probability that France will win the next World Cup? And Arman's answer is yes, they can agree. And he has a very simple protocol, which is just share all of the data. So Alice will tell everything to Bob. So also I'm assuming that the two basions are honest, like fully honest, and they also trust each other fully. And so if they share all of the data, like Alice knows everything that Bob has seen, Bob knows everything that Alice has seen. And so applying the laws of probability forces both of them to conclude to the posterior distribution once all of these data, the data of Bob and Alice, are known. And so they reach a conclusion. So Arman proved, and it's actually a very straightforward theorem, that two basions cannot agree to disagree if they had the same prior initially. But what Arman did not answer is, what if they had a huge amount of data? So if you, like in practice, we humans collect a huge amount of data, especially after years and years and years. And we cannot communicate all of this data, maybe because it takes too much time. If there are like two bytes to be transferred, maybe you cannot transfer this amount of data. And so Arman's question was like, suppose you have two agents that have learned from huge amounts of data. Maybe it's like even more than two bytes. Maybe it's like exabytes, exabytes, like huge amounts of data. Can they agree quickly without transferring most or even hardly any of this data? And the mind-blowing answer of Arman's is that, yes, they can agree efficiently. In fact, the amounts of bits of information they need to exchange is independent from the amount of data they collected. So even if you have two agents that collected as much data as the size of the universe, as opposed to two agents that collected like 10 bits of data, then the number of communication that you need to agree is going to be the same. Or it's going to be the same way. And this is really mind-blowing. When I discovered this, I did not believe it. I had to read the proof to be convinced. And then I had to re-read it to understand the proof. But yeah, it's one of these really remarkable events in computer science. And I think it has like, in terms of philosophy, it's like a philosophical question. If you think about this, it's like, can we have communication? Like, is communication, can communication be made efficient to agree on things? And here you have a very straightforward answer, like a very compelling answer. Or a bit of caveats, of course, because applying base rule is complicated in practice. But it's still a strong indication that if we at least try to be vision, then we can quickly agree. Yeah, maybe it deserves some more explanation on what it means exactly to agree, to disagree. It's something that humans often do during debate. Like, someone believes X, some other debater believes not X. And then they will discuss for some time and the outcome often ends up being, oh, so you believe X, I believe not X. Let's agree that we disagree on that question. But for Bayesian agents, it won't stop there. Bayesian agents will not stop at you believe X, I believe not X. And let's agree that we agree on that question. Because they do the thing called meta-updating. If you observe that a Bayesian agent believes something different than what you believe, it will update your beliefs. And it's something that we can also do in real life with humans. If you see a human that believes something different than you, it will push you to ask questions about what you actually believe. Are you correct? Is that person more correct than me? So if it's my professor at the university, I will most likely update my belief towards what he or she thinks. But if it's someone that has absolutely no credential or that I don't know at all, I might decide to a lot less update what I believe based on this. So when two Bayesian agents interact and know that both of them are high quality Bayesian agents, then they are forced to update towards one another. That's why it's not a regular, according to Bayesian agents, to agree to disagree with another Bayesian agent. Yeah, and one thing that you can discuss as well is the protocol, the debating protocol between two Bayesians who try to agree. Because it's not what you would recommend to debate in general. You tend to think of debates as something very sophisticated where you have to push arguments and reasons you believe and for the key data. But the protocol proposed by Hanson is funny how different it is from all of this. Essentially, Alice is going to say what she believes. Bob is going to listen to Alice and say, oh, I know that she believes this. And he's going to do the meta-updating you're talking about. He's going to apply this rule to update his beliefs. And then he's going to say what he believes. Then Alice is going to listen, update, and have a new belief. And she just says what she believes now. And you have this back and forth where everyone is just saying what they believe. And it sounds like a very bad advice for debating. You don't just say what you believe. It's just the problem has been stated in a formal enough way so that you can compute probabilities. And then you just send a sequence of bits about what the object is. And then you can choose the probability in a specific number. So you just give me your belief in a precise sequence and then a precise probability computation. And then I update mine. Yeah, but it is a good one. Usually we debate about things. And sometimes the debaters don't even know what they're debating about anymore because it's going into all sorts of directions. And it's useful to just make a concrete question. Like can we at least agree on what we're debating about? And choosing a probability I think is a very good way to just remove all the things that are not that important or confusing. And just say, well, let's bet on what's going to happen in two years or something like this. And what are the probabilities that you are going to put? And I think it can help to clarify a lot of the debates. But then I would recommend to do exactly what I would. I think it's useful to just like everyone says what he believes. But because we humans are not very good Bayesians and are very honest and not very fully trusting on one another. Anton's algorithm, communication protocol may not be very efficient for humans in practice, unfortunately. Yeah, maybe another thing I can discuss is the proof of the fact that the two Bayesians will quickly agree in this case. So like the exact proof, like it's quite technical and the paper is a bit hard to read. But the idea of the proof is actually very simple. So the reason why this works is you can imagine a third observer like Eve, for instance, who's like listening to the debate. And all she hears is like Ali saying your number and then Bob saying another number and Ali saying another number and so on. And let's assume that Eve knows nothing like let's consider like so it can be a fictitious if it's just for the sake of the proof. And Eve has the same prior and she has no data and she's just observed the debate. Now it turns out that there's a theorem in Bayesianism in probability theory that says that if if Alice knows strictly more than Eve, then whatever Alice says, Eve has to believe Alice like that. That's a very, again, it's very real theorem to think about it because like it's the argument from authority, you could say. And I guess it's a version of this. It does require a few assumptions, like instance, Alice and Eve, in this case, must be Bayesian. They must be honest and they must fully trust one another, which are put the next in practice in practice. But you have this theorem that says that from the laws of probability, if Alice has strictly more data than Eve and if they had the same prior, and if they know that they know all of this, like they at least knows that Eve knows that Alice has more data than Eve and so on, like Eve has to know that Alice knows that Eve knows and so on. And you should have all of this, then it's a theorem that whatever Alice says, Eve has to believe it. And in the case of the debate between Alice and Bob, when Alice says, well, I believe it's 10%, Eve knows strictly less than Alice at this point because we assume that she has a strictly less data. And so Alice, and so Eve must believe what Alice just said, so Eve must say, OK, so now I believe it's 10%. And now Bob comes in and Bob says, no, actually I believe it's 50% of the consent. It turns out that at this point, Eve knows strictly less than Bob because what Eve knows is no data and the first message that was communicated. So Eve only knows that Alice thinks initially that it's 10%, but Bob also knows it because Bob is listening to Alice as well. So Eve knows strictly less than Bob, so now she should believe whatever Bob says. So if the debate was like Alice said 10%, then Bob say 50%, then Eve first should believe 10% and then 50%, and so on. So then if Alice says 20%, then Eve should believe 20%, if Bob says 40%, then Eve should believe 40%, and so on. And so if you look at Eve's beliefs now, they're going to go back and forth, they're going to ping-pong. And there's another theorem that you can prove that says that a belief cannot oscillate too much. So like the sum of the squared of the variations of the belief must be smaller than the variance of the prior. You have this theorem that you can prove. And this shows that Eve cannot oscillate forever, like she was more like the expectation of her, but anyways, she cannot oscillate forever. So at some point she has to settle somewhere. And if Eve settles somewhere, it means that because she believes whatever Alice and Bob says, it means that Alice and Bob are essentially saying the same thing. And that's how you prove that Alice and Bob will agree. That was clear enough, but I think it's really cool that it's also like a relatively simple proof. I could explain it more or less. And it's very insightful, I think, it's very deep. Yeah, one point to note also about this, this move is that they don't talk about exact agreement, like having exactly the same beliefs. Because these codes in some situations still require an exponential amount of time, different to when we talked about complexity. But in the paper, it talks about a slightly changed definition of an agreement. So it considers what's delta epsilon agreement, which means agreeing with a distance of epsilon. So if the two Bayesian agents still disagree, at least that this agreement is within a very small difference of kilo epsilon and also the delta in there is that the protocol for agreement is not always sure to succeed. And there is a small delta probability that the agreement won't succeed. Yes, Madi? What? What? Yeah, so it's like epsilon delta agreement, like you want to agree up to epsilon with very high probability, but this probability of agreement can be made arbitrarily high. And epsilon can be made arbitrarily small. So it's like a new agreement, as close as you can get to agreement without being guaranteed to achieve it. I don't know if you want to add more things on the paper, but before we wrap it up, I wanted to go to how other communities talk about agreements, because they were close to the result. I also think it would be great to talk about federated learning and Byzantine resilience and distributed system in general. Yeah, so maybe before going to this, we can mention another result of the paper. Which is what happens if you have multiple agents or Alice, Bob, Charlie and so on. So if you have n agents and they get to communicate in some way, like maybe Alice can have a talk directly to Dave, like she has to go to Bob or whatever. Then I want to prove that in this case you can still achieve agreement, but it's going to take longer. Essentially, I'm skipping a few details, so it's not exactly what I'm going to say, but essentially you need to have n times more exchange of communication rounds than for the case where there are only two agents. And I don't know if you mentioned it, but essentially when there are two agents, the number of communication rounds you need to agree up to Epsilon is going to be one over Epsilon squared. So if you disagree about the world cup, if you disagree up to one person, for instance, then it means that the number of communication rounds that you need is roughly of the order of 100 squared. Which is a bit more than that, but essentially it's this. So that's 10000 and it means also that it's not going to be immediate either. You still need to go back and forth. So like agreement between patients still takes a little bit of time and it's not immediate. However, one thing that the paper does not answer, it's left as an open poem, is the question of whether there could be a more effective, more efficient communication scheme than the one of Aronson. Or maybe you can even prove that Aronson's scheme, and I think he proved that it could not be better than something. But yeah, so right now there's a gap like in our knowledge. Like we know that to get Epsilon agreement, you need to communicate at most one over Epsilon squared times. But we know that it's going to take at least log of one over Epsilon. That's the number of decimals you want to agree up. But we don't know the minimal number of communication rounds to which agreement and maybe you can still have an exponential speed up compared to Aronson's result. We don't know. Yes, Louis. So agreement in distributed computing. Yeah, so it's interesting that the paper mentioned this agreement when there are multiple agents and not all agents can communicate with all other agents. This is some cases that is very useful in practice when we have a distributed system with each part of the system making observations about the world. For example, we can think of the recommended systems in today's world with social medias that they are not controlled by one central server but they are absolutely distributed. All these servers have to implement some algorithms to make decisions in a coherent way with one another. So somehow this sort of argument problem has to be solved. And one thing we often discussed in this channel is also the concept of Byzantine resilience, which is when you are a distributed system and you can't trust fully all the parts of the system. How do you continue to do exactly what you want to do when you don't fail because of parts of the system that are working against you. And so this is not mentioned at all in the paper. And I expect that the solutions from this paper won't succeed at all in the case of Byzantines. So malicious and Bayesian is maybe not so perfect. Well yeah, so I think there's an interesting research we've done about Byzantine-Basian agreement or Byzantine-Basian learning in general. Maybe the last point is that there is the distributed community approach to consensus and agreements. And here there is like no inference, no updating of priors. It's mostly propositions and we want to agree on a value that was proposed. So typically the consensus statement is that at the end the value that was decided is a value that was proposed. So rarely you see like there are protocols where people update what they propose. But it's mostly about having a quorum or a quota or like a majority proposing the same thing and then reaching agreements because they propose the same thing. Now the third community that is also tackling agreement a lot is the community of game theory. Social choice theory, aggregating preferences, voting systems. You can think of voting systems as an approach that humanity invented to solve the problem of reaching consensus and agreeing. And also this is maybe another direction where the toolbox of Bayesian agreement can bring new interesting problems people could work on. I don't know if Lé wants to add something on this. Yeah, I think there's a lot to, like these are three different ways but with different constraints to tackle the problem of agreement. And all of them have interesting features and also like maybe you should combine the different features. Like some ideas from one side or like should be combined because like there's a more overarching problem. And so for instance if you take the case for social choice like for voting in general. One thing that we usually don't take into account when we're designing voting systems. And I've done research into this I know a little bit is that we usually assume for instance that the different agents know what they want. And if you enter like if you combine this with more of a Bayesian approach. But the problem in Bayesianism is that people have a guess about what they want or what they think about the world let's say. But they have uncertainty about this and this uncertainty can change depending on the amounts of data that they have. So for instance I think voting with uncertainty is an interesting research area. Another interesting research area would be to combine a Bayesian agreement with or Bayesian like agreement with distributed computing in the presence of Byzantines for instance. And this is close to machine learning to distribution machine learning or things like federated learning for instance. And also you can try to combine all three together for more fun. Okay maybe you can wrap up. Hopefully someone listening to this from either of the three communities could consider working on a problem in the intersection. Between strategy, proofness in game theory or Byzantine for tolerance and distributed computing and of course Bayesian agreement. Yeah. Okay. Yeah. Okay. See you in a week. See you. Bye. Stop.