 Okay, so welcome everyone to our last session of the day. We're going to hear from Alexander Balchuk who's going to tell us about learning the interaction structure in flocks of birds. Okay, well, thanks for coming. Despite all the other attractions going on at the same time as Naftali Tushbi and the sea and the sun, I'm going to talk about learning from data in birds. So this is what we've been doing over a well a number of years now with Bill Bialik, Andra, Irene, Oliver, Thierry, Monde and Massimiliano and a bunch of other people, but these are the people who took part in the part of the work that I'm going to tell you about today. And the general name of the game is how do you get from this to something like this? And that's what I'm going to try and take you through. So this is the obligatory movie, so that you're amazed and you see that all these birds actually move together and form the domains Andra mentioned and it's correlated motion and looks amazing and we can now move on. Okay. So you've seen the movie and from now on we're going to go from this picture of snapshots to these kinds of pictures where you see the velocity vector drawn either in the lab frame or in the center of mass frame, in the center of mass frame. You see these domains emerging that Andra described and the question that we've been asking ourselves is what kind of communication between these birds is at the origin of global order. So these are starlings, so I mean I for one as a non-biologist, I was very surprised to learn that these are starlings that are sort of very non-exotic birds that do these amazing things. But right, so we're going to move to this picture of the center of mass velocities where we see this sort of clearly global order that the whole flock behaves in one way and how does that come about. So just to sort of convince you, although I'm getting since you're here, I'm getting you're already convinced, so I'm preaching to the choir, you do see global order if you calculate the correlation functions either between velocities of the direction vector of the velocities as a function of distance, you see that they are correlated over very very large distances and if you look at flogs of different sizes, because the flog sizes do change by a lot and the number of birds in the flogs changes about tenfold, you see that the size of the correlation domain, so the domain that looks like a sort of one dark blob and moves in the same direction, for flogs of different sizes scales with the size of the flog. And what that means is that this seems to be scale-free correlations, meaning there's no characteristic scale to these correlations. It's not like you have a scale of ten meters and it'll be correlated at ten meters or a hundred meters or one meter. There's no right scale. There's no characteristic scale. That's what scale-free means and what that means is that it suggests that the order is self-organized, that it doesn't come out from some external factor which sets this right length scale for the system, like, you know, given from the outside, but it is actually coming from the behavior of the flog itself. That's at least what we tend to believe every time we see this kind of scale-free behavior. So we would like to convince ourselves a little bit more that this is true and it's not just an artifact of maybe we didn't measure large enough flogs or we, you know, we did something wrong. So how do we do that? Well, as I said, the sort of the name of the game is to turn these beautiful movies into some actual understanding in the form of models. And this is essentially what, you know, what science has been doing for a lot of, for a very long time. And so here is Stefania, which either you already know or you buy tomorrow, you will know, but maybe you won't recognize her in this amazing gear where she's actually taking data. And so the idea of a good model is one that actually reproduces reality so that matches the data. And you can do, you can sort of make sure you have that in two ways. You can either propose a model and then check against data. For example, you have some model and you calculate an observable, such as the correlation function. And in a way, this is what a lot of us are trained to do, but mostly on models that are given to us by older generations. But another thing you could do is you could actually look out of the window and observe reality and try and formulate this model based on these observables. And, you know, if you think back, so in this specific situation, what we're after are the equations of motion of this system. We'd really like the equivalent of Newton's equation for a more complicated system. And, you know, we all are told that Newton figured it out because an apple hit him on the head. But of course, what he really did is he observed a lot. So that's what Stefania is doing here, too. She's observing with a bit more up-to-date tools. So we are going to try and analyze the results of these very, very hard experiments. And I'm hoping Stefania tomorrow will sort of tell you maybe more about midges, but sort of how hard it is actually to acquire this data. It's very non-trivial, but I won't tell you about that. So I'm just going to come in at the point where I already have these velocity vectors, and I want to understand this system. And as I said, this is the sort of question we're trying to get at. How do we get this global order? And the level of description that we're going to go after is similar to the level of description of trying to describe a gas. So we could go after each particle separately and try and track it and describe its trajectory, or we could try and step back and simply describe the probability distribution of velocities of the gas particles in a given room, not caring about the precise identity. And that's what we're going to do with birds. We're going to ask the same question, what is the distribution of velocities of birds without caring what each specific bird does at a given time? So we're going to be mean to birds and not care about the wishes, demands, and personalities. OK, and so how are we going to do it? Well, I'll tell you about the models in a second. And in general, we're going to assume the most general possible model, but we're still going to ask that it reproduces certain features of the data, such as these observables, such as the correlation function. So we're going to constrain the correlation functions of the velocities of the different birds. So now I'm going to go into a few slides of gory details. OK, so I'm going to tell you the details of the model. If you're not interested in the details of the model, you can just believe me that there is a way of learning the probability distribution of the velocities of these birds, which is completely analogous to what we do when we learn the distribution of particles in a gas. And we've been doing that for about 100 years, so you can trust us that we know what we're doing. Or you can sort of, if you're interested, here we go. So we're going to make a few assumptions. What we're really going to look at, so we're going to say that each bird has a velocity V i, each bird is indexed by i, we're going to normalize this velocity. For the time being, I'm only going to care about direction, I'm not going to care about speed. So I don't care how fast it goes, and I'm going to look at the direction of the velocity of each bird. So in terms of my little arrows, I'm just caring where the arrow is pointing. And I'm going to, as I said, I'm going to try and build a model from a class of models which are called maximum entropy models. So maximum entropy models basically are models where you write down the entropy of the distribution of these directions. And you say I want this entropy to be as large as possible, which means I'm going to constrain my distribution of directions as little as possible. I'm not going, I mean, I'm going to be, allow it to be as large as possible. However, I'm going to constrain certain observables. And in this case, we constrain specifically the correlation function. So we're going to say that the correlation functions we calculate from the model should be exactly the same as the correlation functions of the direction of the velocity of the birds as we observe in the data. And the correlation functions of the direction of the velocity of the birds that we observe, I mean, in the data we can actually easily, well, easily, we can calculate until I showed you examples. So this is just a big matrix that we actually know what it is. And, okay, so then, as I said, we're going to maximize the entropy. So with these constraints, so we put in the entropy, and then we put in the observables that we want to constrain. And constrain them by Lagrange multipliers. And we're also going to ask that our probability distribution is normalized, because probability distributions are generally normalized. So that's this term. And then we maximize it with respect to all the possible models possible. Okay, so these are the Lagrange multipliers that enforce constraints. And when we do this variation, so this is just a calculation we can do. We find out that the form of the probability distribution that maximizes this functional has this form. So the Lagrange multipliers get translated into these parameters. Well, one for the normalization goes in here and one for that keeps intact the correlations. And you have this form which, depending on where you're coming from, looks familiar. But it basically tells you that the direction of one bird is tied to the direction of another bird through this interaction matrix, Jij. And since these are continuous variables, right? Because we're talking about actual directions of velocity. So it's an arrow that can go in different directions on a circle. This is equivalent to a Heisenberg model on the lattice. But the lattice is now given to us by the data. So it's not a regular lattice, it's a sort of, it's a bird network lattice. And so okay, so if you happen to not like maximum entropy approaches, you shouldn't sort of get upset just yet, because this turns out that this is completely equivalent to just doing maximum likelihood. Assuming this exponential class function, family of function. So instead, you can forget all about maximum entropy and you can just say, okay, I'm going to assume this class of functions that my probability distribution of different velocities is chosen from a family of functions where I say that the velocity of this bird is depends on the velocity of that bird through this interaction matrix. So I'm just saying I'm looking at the most general class of functions where birds are allowed to interact. And I'll ask, what is this interaction matrix? What is the strength of these interactions? What is the form of these interactions that best fits the data? So then through Bayes' rule, I connect the probability of having a given interaction matrix given the data to the likelihood. That means the probability of the data given a given set of parameters. In my case, the interaction matrix and given some prior on my class of possible interaction matrices. And if I rewrite this for this specific problem, product over all the data points of the directions given the interaction matrix. And then I have a prior and I'm going to assume a flat prior. I'm going to assume that that a priori I have no preference for any type of interaction matrix. So then if I maximize this likelihood, because this is now what it is, a likelihood function given this family class of functions. I will end up with this kind of expression where again, Z is the normalization. And you can see that this is equivalent to the correlation function in the model. If you take the derivative of this normalization function with respect to Jij, you simply get back the correlation between the directions of the birds. And this term is simply the correlation structure from the data. So what this shows you is that if you're happy with a maximum likelihood approach, assume as long as you've taken an exponential family of functions. This is completely equivalent to assuming that your correlation function between the model and the data is to be the same. So we can now hopefully move on and without any sort of further philosophical issues with learning from data. We can try and actually implement this. Since I'll be talking about also other types of inference a bit later, I can say that this maximum entropy and also maximum likelihood approach doesn't only work for if you take pairwise correlations between say the directions of the velocity or pairwise correlations between anything. It works for any set of observables that you just implement using Lagrange multipliers. And then you just simply have more complicated expressions, not just say direct SI times SJ. So direction of this bird, direction of that bird. You could have more complicated terms here. But nothing in the sort of idea of the approach fails. So if you believe there's another observable which is more important, you can just put it in and go work with that. Okay, so back to birds. In the end, we end up with this interacting with this probability distribution that depends on the interactions between the directions of the velocity of one bird and the other through this interaction matrix. And coming back to sort of non-equilibrium physics, well to generally statistical physics. We know that if we have this type of problem then we can use it to derive the Langevin equation by taking the derivative of the Hamiltonian. And this reproduces sort of some dynamical equations that look like typical interacting models for interactive agents used in the field of either active matter or collective behavior. So you now see that the direction of one bird is influenced by the direction of all the other birds, modulo the strength of the interactions with all these other birds, plus some noise. So we see that this sort of constraining the correlation function, which leads to this kind of probability distribution for the directions of birds is completely equivalent to these kinds of models. Of course, it does not mean that this is the correct dynamics or the only possible one. These are all effective models and these interaction matrices are effective interactions. It doesn't mean that there's some sort of special channel of direct communication between these birds, right? This could be transmitted through fluctuations of the environment. This could be transmitted through many different ways, okay? It's an effective method. Okay, so, but right now what we have is we have this interaction matrix of which bird speaks to which bird with what strength. And so that's rather a lot of parameters. And so we'd like to sort of, our goal is ready to work with data and actually do the inference. So we'd like to simplify our life a little and we're going to take this interaction matrix and parametrize it and say that, well, birds interact with each other if they're no more than a given bird's NC neighbor. So the bird counts to NC and it listens, take integrates information with equal strength from all of these neighbors and ignores everybody else. So this is actually equivalent to constraining not the correlation function over the whole flock as I described before, but to a correlation function over NC nearest neighbors. And then we can average over the whole flock over the different neighborhoods for different birds. And because of this, what we're going to do is we're going to use a single velocity snapshot, right? So of course, to get a velocity snapshot, we need two snapshots to real frames, at least. But we're going to do spatial averaging instead of doing ensemble averaging. Meaning we don't need to see the same kind of flock a thousand times because we have many instances of the same averaging without the flock. So this is maybe the moment I disappoint you. And I tell you that I am going to talk about these sort of static descriptions today, Thierry will talk about the movies tomorrow. So no movies today, sorry, okay. So right, so that's the framework. Now we actually have to fit the data. So we have two parameters now in the problem. We have J, which is the strength of interactions between NC neighbors. So we have J and NC. So we're going to maximize the likelihood over J and over NC. The J part we can do analytically. The NC part we have to do numerically and this is what it looks like. And you see that it has a very, very well-defined peak. So this is one flock, sorry, this is one snapshot for one flock. These are different sets of snapshots and you see it always falls in the same place and is well-defined. Okay, but now to sort of, in fact, when we do this fitting, we make one more approximation. We use something called the spin wave approximation, which is a fancy word for the fact that the flock, sorry, flock is polarized. They're strongly polarized. So if there's a general direction of flight of the flock, then all the birds in this flock are more or less flying in the same direction. The velocity vector is closely aligned to this general mean direction with some small fluctuations. And so what we do is we're going to separate out the general velocity direction into the component that's parallel to the general direction and perpendicular. And we're only going to bother about the perpendicular part because those are the fluctuations from the mean. But so the general idea is there's one general direction and everybody, the flock is very, very polarized. And if we do that, then everything just becomes the same kind of description. But in terms of this vector, and that's in fact how we learn because it makes the learning much easier, everything becomes Gaussian. And as we know that everything becomes Gaussian, life is infinitely easier, so that is in fact how we do this learning, including this learning. And then one more thing which should be said, which is actually very important. And I'm sure somebody would ask, is the boundary birds? So we fix the boundary birds. So the boundary birds, and I think this is an ideal so that comes from office talk is they do have a special role because they are on the outside, they have more information. And I think it's a very interesting problem to actually try and figure out what drives their behavior, but it's a very hard problem. So we don't do that, we just say, we know the direction. We know the velocity, we know the direction of the birds of the boundary. And we keep it fixed to what's in the data. Just as we fix the sort of actual position of the birds and then we worry just about the internal birds. And since these plugs are large, they can go up to a few thousands of birds. We still have a lot of work to do. Okay, so that's it, we can learn the model and then we can see whether we can actually answer some potentially interesting biological questions. So the first thing we want to do though is we want to see whether we learn the model correctly. And so to do that, we see whether we can predict elements that we did not put in when we were learning the model. So we can predict the perpendicular correlations, but also the four point correlation functions between two different pairs of birds at a given distance. So then the question is, can we actually get global order from local communication? So why local communication? Because our model does assume local communication. It assumes that only NC birds are taking part in this communication. I haven't told you what NC is. NC could be in principle the whole flock. But we'll see that it's not. That in fact NC is, I'll show you in a second that it's much smaller. And that we do get this, we do reproduce these long scale correlations simply from local interactions. And this maybe in retrospect shouldn't surprise us that much. Because we know, so often other people today also have mentioned the fact that you can come, there's a transition between order and disorder. So here we are in the ordered phase where the birds are, the fact that they all move in one direction is a signature of order. So if you think about this well picture instead of being able to move anywhere they want, they're now stuck in one of these minima and that sets the direction. But in fact, since we're in higher dimensions, we're in three dimensions, this, the picture is more complicated. So you are stuck in the way that you can't move freely along either one of these directions, but there is still a direction among which you are free to move. And that fact that zero mode, that gives you the scale free fluctuations comes from the fact that you are able to move here without paying any price sort of along this direction. And that comes from the origin of that is the global orientational symmetry of the flock. So basically there is no preferred direction for a flock. Whether they move in a polarized way, but whether they move in this direction, that direction, that direction, there's nothing in nature that intrinsically in themselves that tells them this is the only direction in which you can go. And that is the origin of the fact that there is no scale for the problem. Okay, so that's why we get these scale free correlations. That's why we get these very long correlations, because there is nothing in this problem setting a scale. So yeah, maybe I should have had this slide before, because then we can ask, well, what are these local interactions? And what is the scale? How many birds talk to each other? And is the interaction metric or typological? Meaning, so meaning. So a typological interaction is a situation where each bird counts up to some, sorry, is this? No, sorry, this is the metric interaction where each bird takes into account all the birds within some radius. And if they become denser, that means it listens to more birds. Whereas the topological situation is where the bird always listens to some number of birds in its neighborhood. And then if the flood gets denser, that just means that it listens to birds that are close to itself in physical distance, but it still counts to the same magic number. And the consequences of this are such that the density would scale, that the number of birds as a function of density here, or distance scales linearly, and here remains constant. So if we now look at the data, we see that in fact, birds have topological, not metric interactions. So the interaction range does not depend on flock size, nor does it depend on flock density, it's always constant. And the number of birds that other birds listen to is 21, okay? So it's not free, it's not like a super small number. But 21 for a flock size of 5,000 is still a pretty small number of birds. So it's still local interactions that are giving you these long range correlations. And you could say, okay, but that's because you learned a model like that. What if you'd learned a model with a metric distance, that the birds actually listened to other birds over some fixed distance? Would that change anything? And the answer is no. If we learn a metric model, we do recover then a dependence on distance, which again proves that we have topological dependence. Okay, so that was direction, but a velocity is composed of direction, and it's amplitude, which we call speed. And if we look at the correlations now between speeds of different birds and a flock, we also see the scale-free correlation. So we see that the correlation length, so this domain of birds that have a correlated speed, move roughly at the same speed, scales with linearly with the flock size. So then you can say, okay, well, now I'm used to all of these things. I'm used to you showing me results like that. So I shouldn't be surprised. But this time, in fact, you should be surprised. Because the reason we got the scale-free interactions in the velocity direction is because there was no symmetry, sorry, there was a symmetry associated with direction, right? As I said, the whole flow moves in this direction or that direction. That direction, that doesn't really matter. There's no preferred direction in space. But velocity, sorry, speed is more complicated, right? Velocity has physical bounds. So this is what this animation that I stole actually from Cherry shows, that a bird cannot fly faster than a plane, right? There are some physical bounds on how fast, how fast a bird can be. So you don't expect this, you expect speed to actually be constrained and not be able to do whatever it wants. So what's going on? So we're going to sort of try and figure that out. We're going to try and build these same kind of models. But now we're going to constrain the local alignment. So we're going to look at the local alignment and constrain it. We're going to ask how similar the velocity of each bird is compared to its neighbors, taking in account the actual amplitude of the speed. And we're going to learn a maximum entropy model. So again, we're going to try and learn a model for the distribution of speeds this time, which is the most general one, but it constrains the mean velocity and also the standard deviations, the second moment of the velocity, right? So we generally want to constrain the distribution of velocities. And if we do that, going through the math, we get the same similar kind of models, where we have a term that corresponds to interaction with the coordination with the neighbors. We have our interaction matrix, how many neighbors you listen to, which we're going to parameterize with NC also. And then we have a second term, which describes the individual speed control. So this is like a term where you look around and you listen to what your NC neighbors are doing again. But this is a term where you look to some mean of the flock to some external velocity. Like basically, you're looking at your odometer because you know you have a speed limit of 80 kilometers per hour, and you want to go as fast as possible. So you try and keep to 80 kilometers an hour. So that's your odometer, and that's all your other friends in your cars trying to do the same thing. So again, we have to now learn this. We're going to do this in the same spin-wave approximation. So this just assumes the polarization of the flock. Technical details, this is sort of a picture of the birds listening to each other and then trying to keep to this externally set gold standard. If you do the interesting thing here is that if you do assume this flock is very polarized, it turns out that the probability distribution separates into two independent, it factorizes into two terms, one which corresponds to the orientation, the direction, and one which corresponds to the speed. So now these two things are separately controlled. And again, everything remains Gaussian, so it makes learning easy. And now we have these three parameters which we need to learn. So J and G are fixed by fixing the fluctuations in the velocity that the moments were trying to constrain. And then C is learned from the likelihood from its maximum value. So it works. We can do the fit. Again, we can predict correlations that we didn't put in when learning it. And what do we learn? We learned that the ratio of the parameter that sets the general trying to get the odometer right, so the external velocity constraint to listening to your neighbors is very small. So that means that the birds are generally listening to the neighbors much more than trying to get the speed right by themselves. And in fact, this small value means the system is critical. So that's essentially just the word. If we take this parameter and go from relatively high values to lower values, we see we reproduce the correct behavior of the correlation function of speed with data only at very small values. And these very small values are the values that give us these long range correlations because large values would give us a very fast decay of correlations. So that would mean that correlations would only be fit over very small distances. But if we get it over very large distances, we have to have this small value of this parameter, which means we have to pay much more attention to our neighbors than we pay to ourselves. And if we take this model and now turn it into a continuous version, we can actually calculate within this model what the correlation length is. Our C is an average distance between different birds. So as Andar was saying, this is a very complicated thing. But within this effective model, and I accentuate effective, we do have an idea for how it behaves. And we see that if this parameter telling us what each bird is trying, that is how each bird, how strongly each bird is focusing on getting the speed right without its neighbors, if this goes to zero, this correlation length goes to infinity. Infinity, of course, doesn't happen. It goes to the size of the flock. But it now means that all birds are sort of on board and listening to themselves. So who cares? So we learned this parameter. What are the implications of this? Well, it basically means that birds really don't try to get it right themselves. They really rely on listening to the whole flock to get it right. And although they're doing right, the fluctuations in the velocity are very, very small. So they manage. So it's not a bad strategy. And why may it be a good idea to do something like that? Well, in this regime where that is called generally critical, as again, as Offer was saying, the susceptibility to external perturbations is very large. So it means that if a predator attacks, I mean, the whole flock will become very susceptible because there's these long range correlations. And so that was the static picture. And then we can ask, well, of course, birds are dynamic and things are much more complicated. And well, that is true. And that's going to come tomorrow. OK, so I just wanted to show you that we can actually go take data and try and learn models which then show us that you do get global order from local interactions that are topological in nature, not metric. And there's a prediction, well, actually, that's not right. So that you're very susceptible to perturbations. OK, thank you.