 That reminds me of the ERC interview, by the way, like 10 minutes ago, then shut up, right, sorry. So I'm gonna talk today, thanks first of all to the organizers for inviting me to a very interesting meeting and I think compared to a lot of the really interesting, sophisticated stuff that we've seen in the previous days. This will look comparatively simple-minded, but maybe still useful in some way. But I'm gonna start with a start with a warning and Munahiro, I hope you forgive me for taking your quotes. I think this quote's due to you because I looked it up on the internet, I couldn't find it anywhere else. Theory minus experiment equals mythology. Anyone who knows me re- Sorry? You translated it. Anyway, anyone who knows me reasonably well will know that this probably describes my approach to science. Because I do theory and not really experiment and I think it has the status of mythology. Sorry, guilty as charged, but maybe maybe I rewrite it though, if I'm allowed, if you forgive me, and I say theory minus experiment equals prediction and that means a prediction is an experiment waiting to happen. So maybe I'll have some predictions today that will be of interest to maybe experimental people out there. So I'm gonna make some predictions relating to, well, I'm gonna discuss optimal decision-making by individuals. So using optimality theory and optimal decision theory for groups, both small and large. But then I'm going to try and bridge the two. And if you haven't been to Venice, this is the Rialto and if you have the chance to visit while you're in this part of the world, please make sure you do. So I like building bridges. Maybe this talk will achieve that. So I'm gonna start with discussing the wisdom of the crowd, this very old idea that some people have alluded to in their talks already. And if for anyone who doesn't know the story, it goes back to the days of country fairs in which we still have in the UK, in England, and the kind of the popular pastime of guess the weight of the animal. And Francis Galton took, as you'll probably know, a load of estimates of the weight of an ox. And did some analysis of them and got a result that showed that the median of all the estimates of the crowd was it within 1% of the true value of the weight of the ox. So with, you know, crowds together, uninformed individuals, sorry, relatively poorly informed individuals, combining their information, even if they're not aware that their information is being combined, can do better than individuals themselves, lone individuals. I show the actual paper for two reasons. One to show how easy it was to get away with, you know, to get a nature paper a hundred years ago, one page and one graph. Also, I was just looking at this the other, well, last night, and I thought I'd highlight the first, the opening sentences that Galton wrote. In these democratic days any investigation into the trustworthiness and peculiarities of popular judgment is of interest. Well, very true words for our times. Okay, well, that's people. I'm going to talk about animal groups and I kind of did a lot of the thinking that led to this first project, which is basically a review project. When I was wandering around the Kalahari following meerkats or rather following meerkat followers. So the people who go around logging the data. And a meerkat group or any other animal group that forages collectively, for example, has a problem about how to avoid predation. And meerkats solve that by having certain individuals taking a role shown here in this picture known as raised guarding, where they basically stand on their hind legs on whatever they can find in the environment. In this case, a dreedory bush here. And they look around and they pay attention and they got to ask themselves continuously a question. Is there a predator present in which case I should warn the rest of the group or is there no predator present in which case I can leave them to carry on foraging? And we heard earlier in the week about, you know, vocalizations by meerkats. Various things they do when they're cooperatively foraging and one of the vocalizations is an alarm call. So I'll come to how we can solve that problem optimally in a second at the individual level. But before I do I'll talk about another kind of piece of law in collective behavior, which explains why groups are effective, more effective at making decisions than individuals, typically, and that's known as the condorsay jury theorem. And it's just a very simple mathematical argument. So like Galton was data driven, but this is more of a mathematical first principles argument for why groups making binary choices should be more accurate than their constituent individuals. And it's based on the binomial distribution, very simple piece of maths. But it makes some assumptions. It assumes that the opinions of the individuals are independent and individuals have equal decision accuracy to each other. And it also assumes that there's only or it's been claimed to show that there's only a benefit from group decision making when the accuracy of individuals exceeds one half. And another thing is that this indicates a function here indicates that only majority decisions are considered. So basically whatever the majority of individuals in the group decide that's taken to be the group decision. So what I'm going to do in this talk is actually question a couple of these assumptions and what happens when you relax them and whether things look any more interesting. But before I do so let's just give you a simple numerical sketch of what happens. So here on the x-axis we have group size increasing and on the y-axis probability that the majority decision is correct. And you can see it increases from 0.6 because all the individuals in this in this simulation have accuracy of 0.6 and it asymptotes towards one. So large groups make the correct choice with high probability basically. So I mentioned there's a couple of assumptions. The first one I'm going to relax is that individuals have equal decision accuracy. So this section of the talk is all review. There'll be some new stuff after which I'll highlight. But if we relax this assumption or if we recognize something from the biology it's that individuals have variable quality information in the real world. So real individuals have different decision accuracies for a variety of reasons such as individual differences in ability but also very likely access to different qualities of information used to spatial positioning like stood on top of a bush versus foraging in the mud. And in individual decision making individual behavior should change according to information quality. And this has been shown experimentally. So here for example is work by Adam Kepix and colleagues on choice behavior by rats faced with an ambiguous stimulus. And what Adam found was that rats where there's a where a stimulus predicts a reward but you can manipulate the reliability of that stimulus. Rats if the rats presented with an ambiguous stimulus they'll wait longer before giving up for a reward that they think they should be getting. So basically they're getting a kind of binary stimulus reward present versus absent ambiguous stimuli they wait longer to receive the reward because basically they're not very confident in the level of their information. And actually in neural firing rate in orbit of frontal cortex predicts the willingness to wait for reward so there's neural correlates of confidence found in the rat brain and it's reasonable to think in many other animals. The other thing to know is that the theory of confidence has also been incorporated into collective decision making theory and as I said this is review so I'm not the only person to notice this. I came at this from working with people in machine learning where ensemble theory shows how to combine multiple weak classifiers like neural networks via voting schemes. So that the result gives superior performance to a single classifier in other words the wisdom of the crowd for machine learning. But it also shows exactly how individual classifiers individual votes should be weighted by accuracy. And one way you can derive this is basically to if you assume that decisions are kind of all or nothing there's like a zero loss zero one loss function. You can take an exponential upper bound on that and do some calculus to minimize the error you get from adding each additional votes. And what you find is that individual votes should be weighted according to the log of the odds ratio the log of the individual decision makers accuracy over their error rate. This isn't new by the way as I said this is all in review so other people have noticed the same weighting scheme for example for example Albert Cowell and colleagues have applied it to learning models. Max Wolfen colleagues including Jens who's here have applied it to optimizing medical panel decisions so you know this is known it's out there. What I'm doing is just reviewing the state of the art what we know. But this shows what happens relative to Condorcet when we actually include confidence weighting. So the lower curve we have standard Condorcet but now we have variable individuals so we have a very small number of high quality individuals whose accuracy is 0.75. There's only three of them and then we have 97 other individuals who are comparatively poor decision makers and then we increase on the X axis their accuracy from one half. So they're always you know they have no information up to 0.75 so they're as well informed as the minority we have. And what we see is that the Condorcet curve goes like this but with confidence weighting we get benefits for very small group sizes. We get substantial increases in group accuracy because basically the weighting scheme means that unconfident individuals who are poor at making decisions don't contribute hardly anything to the group decision leaving it to those who are most competent to drive the group decision. So as you'd expect these are just numerical simulations but they just show you where there are benefits from confidence weighting what kind of groups. And these are just two examples for different values of the group the minority of voters accuracy so 0.6 versus 0.75 and as you'd expect intuitively we can just confirm through simulation. The situations where you get the biggest gains for confidence weighting over just simple majority Condorcet voting are when there's reasonably high standard deviation of accuracy within the group. So there is variable quality of decision making ability in the group and also the groups are relatively small. So the red regions are the regions where we have the biggest improvement over Condorcet's group decision making accuracy. So if you're in a small group of variable individuals pays to weight your decisions by confidence. So as I said this is all in review people have noticed this noted this kind of idea in human collective decision making where of course there's very great potential for communication of things like subjective confidence. So Barami and colleagues have done these kind of opinion pooling experiments where you actually have a negotiation phase before a group decision is arrived at. And as you'd expect when you allow information pooling the decisions of the pairs of experimental subjects increase remarkably in accuracy. But are there simpler mechanisms for determining individual confidence and combining them than verbal negotiation in human groups? Well how I'm going to tackle this is by looking at how very low level early sensory decisions are made in the brain. So this is psychophysics and we already saw this in Dan's talk this random dot field. If you have a random dot field so you have a stochastic stimulus you have some signal in there and you have some noise in the stimulus. And you want to make a decision as effectively as you possibly can given the information the stimulus contains. Well it turns out the optimal solution to this in terms of the statistically optimal solution the one that optimally compromises between the expected speed of the decision and the expected accuracy of the decision basically reduces down to what's known as the drift diffusion model. So this is very elementary physics of course. Basically you describe the decision process as a particle moving along an evidence line towards decision thresholds for the two alternatives in this case rightwards versus leftward motion. There's a constant tendency to move towards the correct decision boundary and the magnitude of that is proportional to the stimulus signal. But there's also the Brownian motion component the white noise in that particles motion and that's the standard deviation of that or the variance of that is proportional to the noise in the stimulus. So then what I'm going to just show you is how people so that's a kind of optimal decision making mechanism and there are neural circuits that can implement that in a variety of or approximated in a variety of neuro biologically plausible ways. Well I want to talk to you about his work by Mike Shadlin showing up here and his colleague Keany on how you can get from the drift diffusion model to actually estimating subjective confidence. Because Adam Kepex with the rats odors had a kind of observation of confidence in neural correlates but here we can go back to first principles actually. And what they show is that actually if you have if you're making a decision using a drift diffusion process but there's some uncertainty about quality of the information you've presented with what that means is you don't know about the signal to noise ratio. Yeah so you might be looking at a good stimulus or you might be looking at a bad stimulus of course you don't know a priori so you need to be able to infer that somehow from your decision making process. Well if you plot some quantity in fact it's x it's the position or this is the probability mass function of where the decision variable is where the particle is over time with time going along the x axis. Zero in the middle so that's zero evidence positive evidence in the top half negative evidence in the bottom half. If you plot some quantity that evolves or the probability of some quantity that evolves over time you get something that looks like this. And if you work under let's say what's known as the interrogation protocol so you say at some point in time you say okay what's your decision but also how confident are you in it. What's going to happen is this for one particular trial your your evidence is going to move around randomly but with drift towards the correct boundary. But when you're then asked to declare your decision well first you'd say based on the sign what your decision was but you could also work out your confidence in that decision because. Depending on where you actually moved to that's information about what the quality of the stimulus what the quality of the information you actually integrating was and that looks a little bit messy perhaps but if we just take a slice through and say okay for some decision variable value. How does our confidence evolve over time actually this can have a very simple functional form so it actually is you could approximate it nicely with piecewise linear your confidence starts reasonably. Well not your confidence a quantity starts reasonably high tails off linearly and then and then flattens out what is this quantity that we're plotting this is actually the log odds ratio. So that's exactly what we were looking for in optimal group decision making as well. So I find this quite nice because this is a this is one of the bridges I'd like to find between individually optimal decision making and collectively optimal decision making individuals individually optimal decision makers can compute reasonably easy easily the quantity they need to optimally combine their votes with those of their group mates. So this leads to just a few simple predictions first of all animals should account for decision reliability when they're signaling to or integrating signals from group mates. You have to pay attention to possible conflicts of interest so actually who should do the weighting varies according to the relatedness in the group in highly related groups you can expect signalers to want to reliably signal their confidence. But in unrelated groups there's the risk of manipulation for example if you think about collective foraging there's the risk of making a false alarm call so everyone runs away and you can go in and get the food which you actually even see between species in the Kalahari never mind within species. So there should be but there should be selected process at some level within the group to actually attend to or signal this kind of information this kind of confidence information. And from the link with individually optimal decision making theory animals could use hesitancy as a cue for inferring confidence in their own or others decisions because under the drift diffusion process if you have poor quality information it takes you a long time to reach a decision threshold. So if an animal or a human is taking a long time to reach a decision that's actually good evidence that they have access to poor quality information and you should down wait any vote they do eventually reach. Okay so that's all the that's just that's the review stuff it's from a review article we wrote last year as you just saw and it was just really trying to link things together but nothing new. So let's try and do something let's try and look at something new now by relaxing the other main assumption of the condo safe jury theorem which is that the benefit from group decision making only arises when individuals on average make fewer mistakes than they make correct classifications. The accuracy of individuals is greater than one half this seems to come out because if I read it the condo safe plot with accuracy less than one half then actually group as you increase the group size accuracy approaches zero instead of one. So it seems that there is a kind of lower bound on the individual accuracy you should you should allow group voting to occur at but this neglects something rather important which is signal detection theory. And here is green of green and sweats for anyone who knows about signal detection theory and signal detection theory recognizes that accuracy is not a one dimensional quality as it's treated in the condo safe theorem. And that's because there's two kinds of mistakes you can make in a binary choice. So signal detection theory in its simplest form basically assumes you have like a scale a scalar random variable so some continuous quality which comes from one of two random distributions shown here and really you're trying to say when something is there or is not there. So if you say some that something is there when it is there it's called a hit or a true positive if you say and if you don't say it's there when it is there it's a miss or a false negative. On the other hand if you say something is there when it's not like a predator for example then it's a false alarm but if you say something isn't there when it really isn't and it's a correct reject. And what signal detection theory does is realize that actually there are probably in fact the exception rather than the rule that the errors sorry the cost of these different errors are the same as each other. What you really expect is highly asymmetric errors so signal detection theory kind of grew out of looking for bombers during the early days of radar so it's highly asymmetric case. If you raise a false alarm the cost from that of you know scrambling a fighter squadron to intercept is different from the cost of missing the bomber that comes in and drops the bomb. Also there are different prior probabilities of the two different states of the world as well right. Most of the time there is no bomber so you can say your prior is that there isn't a bomber there but occasionally there is. What signal detection theory does for you is actually to then enable you to calculate how you should set the threshold on this random variable that optimally the optimizes your expected loss or your expected gain from these classification decisions taking account of the different error costs and the prior probability. And this is done through the ROC curve or receiver operating characteristic curve so basically the decision problem so how well separated these two distributions are defines the difficulty of your decision. And then that gives you as you move your threshold it gives you a set of possible values you can at you can achieve as a decision maker compromising your true positive rate against your false positive rate. So when the decision is hard you'll be very close to the diagonal when the decision is easy you'll be up near the top left hand corner. Then your problem as a decision maker is to choose your threshold that actually gives you a realized pair of these values a realized true positive rate a realized false positive rate. And the values you actually should realize depend on these costs and prior probabilities. So here's one example which might correspond to some asymmetric costs or some asymmetric prior on the state of the world. But then shifting those costs or shifting those priors you would actually be moving the optimal decision point and thus the realized true positive rate and false positive rate through through the ROC curve. So again this is individually optimal decision making theory you know it's been around for 50 plus years. And so Max Wolf and colleagues including Jens who's in the audience as I said have already noticed that true positive rates and false positive rates need to be accounted for differently. And I proposed how you could simultaneously improve true positive and false positive rates towards one which is the best possible outcome through quorum sensing. So very simple but effective idea now four years old at least in our literature. And the idea is this if you don't follow it read the paper but in positive states of the world the group accuracy will converge to the accuracy in positive states of the world. In negative world states the group accuracy will converge to the accuracy in negative states of the world. So if you just let everyone choose and then select a quorum threshold between those two a proportional quorum threshold between those two accuracies. Then according to which side of the group that the number of which side of the quorum the number of votes is you can make your classification at the group level. And so you'll simultaneously optimize both true positive rate and false positive rates. You'll make you'll become in other words a perfect decision maker who never makes mistakes under either possible state of the world. And you can always do this if true positive rate is greater than false positive rate because then you can choose the quorum threshold to be between them. Between those two rates for the quorum to be set. And actually this is enforced by signal detection theory. Although I won't I need not explain well I'll explain now why briefly. The reason true positive rate must exceed false positive rate is because if you're down here this is where true positive rate is below false positive rate in the dark grey triangle. You're a systematically bad decision maker in both states of the world. So all you need to do is reverse your predictions all the time and you flip from that triangle to this to this to this triangle in the top left. And so you simultaneously improve both true positive and false positive rate. So that's good because it means individually optimal decision makers or groups of them should always be able to use the quorum trick to simultaneously optimize true positive and false positive rate at the group level. So again that's all existing stuff so here's the new stuff the exciting stuff which we're about to submit. At least I'm excited about it. All right so true positive rate less than false positive rate impossible so we don't care about this region down here we'll never see optimal decision makers there. In the white region condos a predictions are correct which you can take as being equivalent to using a simple majority is optimal in that it will always improve group decisions. So when true positive rate and false positive rate lead you into this region of the ROC curve condos a works correctly and majority voting is optimal. But that leaves a brief aside so now we have to relate true positive rate and false positive rate to the accuracy that condos a actually concerned concerns and of course it's pretty straightforward to do that. It's just expected accuracy which is accuracy in the positive states of the world weighted by the probability of being in that state of the world and then the same for accuracy in the negative states of the world. So the accuracy for the condos a calculation is just the expected accuracy given your prior probabilities and your accuracies in the two states of the world. What this means is there are two regions of the ROC space where condos a makes incorrect predictions and simple majority voting is suboptimal. I'm going to take you through now a number of errors and when they occur. So when is condos a and simple majority voting wrong there's a number of ways in which condos a can be wrong and these all correspond to cases where majority voting is suboptimal. The first error I call it error 1 a is the condos a predicts group accuracy approaches one but simple majority group groups do not. When does this occur well this is a delineation of some of the space of possible decision problems where this occurs so in the gray region this is where this error occurs and the space works like this along the X axis we have the prior probability of the positive states of the world. So this is the decision ecology if you like and along the Y axis we have the ratio of the costs of false positives versus false negatives. The ratio of those two quantities so it tells you basically how costly are the two errors the two kinds of errors relative to each other. This is on a linear scale along here but a logarithmic scale here. So actually we can see quite a lot of the space of possible decision problems condos a makes this error and a simple majority vote goes wrong. Let me show you how it goes wrong so here's an example in in this example when you're applying a majority decision making rule the group accuracy doesn't converge to one but it actually converges to the prior probability. So the blue lines are simulated our averages of simulations of con of majority groups condos a predicts they should be approaching one because the expected accuracy is above one half but they actually converge on the prior probability of the positive states of the world. They do however there is how they're still an improvement relative to the individuals who all have this accuracy of just have just above point eight. Okay so condos a makes a bad prediction majority voting doesn't work how can we fix it well the answer is of course to apply them the quorum rule and just choose the quorum value appropriately. And when we do that we can see that the groups the group does indeed approach accuracy one. Let's look at another error that's very closely related to the first which is in fact it's a refinement of the first error condos a predicts group accuracy approaches one. But simple majority groups do worse than individuals. So again the expected accuracy is still above a half so condos a predicts as we increase group size we should be approaching one. But we can find situations where although the group accuracy doesn't go to zero it does go to a value lower than the accuracies of the individuals within the group. And again we can fix that by applying the quorum rule from Max and colleagues and setting the quorum appropriately and we rescue the situation and the group accuracy approaches one again as we would hope. The final error is that condos a predicts the group accuracy approaches zero because the expected accuracy is below one half but simple majority groups that doesn't happen to them. So here's an example of that and although the group accuracy does decrease it doesn't go down to zero it actually approaches the frequency of one of the states of the world. Again we can fix that by applying the quorum trick and again we can rescue the group decision making and have it approach zero as group size increases. So why is this interesting? Well I think it's interesting because if you ask yourself when is condos a making incorrect predictions maybe that seems a bit esoteric and a fairly self referential exercise perhaps it's not relevant for the real world. But it actually turns out to be equivalent to asking the question of when simple majority voting is the right thing to do. And if we ask that question we just take the union of all the sets I've shown you then this white region here is when condos say predictions are correct and when majority voting and or when majority voting is the right thing to do. So we actually see that majority voting is only the right thing to do in a very narrow range of possible decision scenarios. And that's interesting because a lot of people have made claims over the years about the majority rule being very robust or even optimal and what this analysis shows is actually it's really only for a very special set of cases. Well the analysis also shows us is that when we're above this region on the top left that's when we should use a sub majority quorum. And when we're in the decision region below right we should use a super majority quorum and I'll decode those in a little bit because what we're doing here is relating quorum usage quorum characteristics to decision ecology. So here are the predictions quorums should be a robust feature of decision making even in single shot decisions without conflicts of interest. So quorum thresholds have been proposed for kind of ongoing decision making processes to manage the speed accuracy trade off and in single shot decisions quorums have been proposed to manage conflicts of interest within the group. But actually even if you have a group of individuals making a single shot decision where they're all agreed on what the best outcome would be even if they have different information on what the best outcome is they should still use quorums very frequently. And in particular if the negative state of the world is rarer and or false negatives are relatively expensive compared to false positives we should be finding super majority quorums in use. And the converse conversely we should find sub majority quorums being used whenever the positive state of the world is rarer and or false positives are relatively expensive. And of course then for a particular decision scenario with a real organism you just have to map positive and negative onto different states of your environment for something that the organism cares about. Just a proposal I think maybe bacterial quorum sensing is well studied so perhaps that would be a good system to look at a while. Also be very interested in talking to anyone who works on vertebrates and so forth who think they might be able to test some of these ideas out. So that's really it but I'll just give brief advert for what we're doing next because that's all relating to animal behavior. But what we're currently now looking at is these were very simple models but can we embed the rules we derive from the simple models in more complex environments. So for example embed them in heterogeneous networks and then see how well they work. And here's one of our algorithms based on confidence sharing converging in a spatial network over time. And we're also trying to translate the insights from the decision theory into the design of artificial engineered systems artificial engineered decentralized systems. So we're using kilobot swarms and so this is just Giovanni who's given a talk already is in my group and is working on these aspects as we speak almost. And this is all part of a larger project funded by the ERC called Distributed Optimal Decision Making Algorithms where we're really trying to find we're trying to establish the two way bridge between biology and decision theory and engineering. So maybe it's not a two way bridge maybe it's a three way bridge or a four way bridge but it seems like it's being fairly productive so far. Okay thanks for your attention. Any question?