 Can you hear me? Thank you very much for the invitation to be here. Today I will be telling you about our unpublished work, so you can give me feedback. In the lab, we do three things. I will always say I do three things, because if I say more, we get lost. So I will always try to say three things for everything. So one of the things we do is to model behavior. And I will be talking about modeling collective behavior. We are also neuroscientists, so we are looking at brain activity while the animals are in groups or interacting with other animals. There's another path in the lab that is exploring new approaches to machine learning or AI. I will be talking about modeling collective behavior, but I will tell you one minute about our work in brains by just showing you a video. So here what we have is an animal, a zebrafish. This one might be nine days old or 10 days old. It's embedded in agarose. Agarose is transparent. The animal is intact. We don't open the animal, but it's trapped in agarose. We can actually remove a bit of agarose, so it can move the tail. We can also remove agarose from the eyes, and it can move them, the eyes, and the tail. In this particular video, we didn't remove anything, so the image is a bit more clear. And what you're going to see is the brain activity. We are expressing genetically calcium indicators, in this case, in all the brain. So you will see calcium changes that are coming from the brain activity, and in red, there will be a tracking of another fish around this one. Because the brain turns out to respond in the contralateral side to the fish, I flipped the trajectory, so it's easier to see the other animal close to brain activity on this side. So I'm going to run the video. Let's see what the image is. This is raw data. It's not treated data. So you can maybe see, you can already imagine the correlations between the position of the other animal and where in the brain gets active. This is normal epifluorescence. We can do better, but this is just to show to you. If you treat the data in a very simple way and you ask yourself which part of the brain is responding to what position of the other animal, let's say that these are the positions of the other animal in different colors. And for example, this part of the brain is responding to this position of the animal, this one to that. So that's our reason to choose zebrafish. We can do both behavior in collectives in the lab. We can manipulate the brain. We can record brain activity. So it's quite powerful to use this system. As I explained to you before, one other part of the lab is trying to explore new approaches to AI. In particular, we are trying to see how far we can get doing some kind of machine learning without function minimization. So if you are curious about what we are doing, this is an archive paper about our approach. It's based on modern algebra, and it doesn't use functions. Anyway, today I will stick to modeling behavior. And in the past, we've done it in different ways. Our philosophy has always been similar to many of you to use very simple models that could give insight about what the animals are doing. And I want to explain a bit of that philosophy to basically make myself the enemy of myself and say, OK, there is a limit to what you can do with very simple models. And today I will explore with you what happens if we go into very complicated models and how best maybe to do it. Anyway, so the type of things we've been doing, for example, was to start in my case, because I was a physicist in two lives before, in my previous life maybe a neuroscientist. I tend to produce these models based on ideas of neuroscience. For example, one we've been playing with is the idea that what an animal is doing is to estimate the probability that there is something out in the world, given what it's seen, and giving us the behaviors of other agents. So we've been working with that idea using very simple models. And it has been quite successful for very simple experiments. That doesn't give you the whole picture. We've been trying to model other parts of collective behavior. One of them, how an animal approaches another, using control theory for coordination, basically using the idea of minimal time in a burst and glide fish. For animal fights, we've been using game theory. We've been using heuristic rules. But in other cases, very simple models. Let me illustrate one of the more complicated ones, but it's still very simple, from the idea that what brains are doing is to estimate something out in the world, but not only using your private information, but also the information of what other agents are doing. You can come up with some assumptions with very simple predictions for experiments. In this particular case, we have this experiment in which our focal animal is choosing between two options, X and Y, with different animals in X and different animals in Y. So this is the formula we get from that idea that animals are estimating. And in this particular case, it has two variables, number of animals at X and number of animals at Y, and three parameters, K, S, and A. And here, I'm plotting N, X, and Y, and my formula here that gives me this surface that depends on the values of the parameters. And I can change in the experiment the number of animals in each of the two options in X and Y. So imagine I change the animals in X, and I have two, and one in Y. That would be the prediction to do the experiment you compare. So you see, it's only two variables and three parameters. And for me, it's already a bit of a nightmare model. I mean, it's from the theory you need to simplify to go this simple, but it still has a few parameters. So why I'm not happy with this and why I'm going to spend 30 minutes talking about something else. The reason is that, as you can see, the experiment is very simple, and the theory has been matched to the simplicity of the experiment. And that's why it can be very simple. I think that's a very fruitful approach, but it's limited to always being doing very simple experiments. So what happens if we now do a more complicated experiment? Can we use these type of models? But in general, no, we cannot. It captures some part of the behavior, but not the totality. So you now imagine that you want to apply this previous model to an animals moving in a group. Let's say I put my zebra fish in a tank, and I let them move. And I see how accurate is the, imagine I try to predict at each point in time, whether the animal is going to move to the right or to the left, depending on the configuration of animals. And I plot what is the accuracy of my model. I mean, 50% is a chance. So 70% is something, but you manage to capture that maybe with one or two or three parameters, but you are far from very good prediction. So what can you do? You can try. This is fake data, by the way, but just to illustrate the point. You can try to do models with more variables and with more parameters. Now this is, again, a bit trivial, because at some point you're going to have enough variables and parameters to really explain your training data, because that what happens. I mean, with enough parameters, you can explain whatever you want. So one requirement is going to be that we are not going to use training data, except to fit the models, but we are going to fit them in a way that they should be tested in new data. We are not happy fitting the data, but we are happier if we can predict what the animal is going to do in new data. Now I will go all the way to a large number of variables, but I will argue also that we should play with systems to have less and less number of variables so we can have some insight. So today I'm not going to make the point that you should have full insight with very few variables. I'm not going to say that you should have these type of systems, but I will argue in general that it should not be there. So you are better off always being in that curve, either getting lots of insight with very simple models or more ability to predict with larger models, but you should never be in the stupid region of modeling. For today, I'm going to forget. I'm not going to do a talk on the work we have published, but only on how far we can get using deep nets. There are reasons to use deep nets. One is that for many problems is the most general function approximator we have. So it's very good at approximating data. In the last few years, let's say 2017, there were many practical advances to use deep nets. They are coming out all the time. There are reasons not to use them, and probably you have those reasons. One is that there is a bit of black magic in how to use them, and also they can act as black boxes. So during the talk, I am a bit divided. I believe that there is some value in using them, but you will by yourself decide whether you are one of these people that want to use them or not. Because it's true that there is some black magic. There's actually some black magic in all modeling. And to some extent, there are black boxes, but we will play with them to see how far we can get. I will try to talk about three things. Maybe I don't have time for the last one. One is to give you an update on our tracking system using deep nets. Another one is to, from data, obtain in this tracking system how we can build models using deep nets. The last one I have, if I have time, I will tell you our efforts in using reinforcement learning in modeling collective behavior. Our idea for tracking a few years ago was that, as Stefania was saying, that when 20 months go into a region of occlusion, our idea was that to know who was who afterwards, you would look at the images. There would be a way of looking at the images that would give you a fingerprint of the animal. So this is George, and this is Tom. And just by analyzing the images afterwards, you will see that that was Tom and that was George. And if you do that, we had a way of doing it using a kind of texture analysis that was invariant under rotations and translations. You can see that in a video in 2D and here time, there is a trajectory here for eight animals and in color where the tracking system agrees with a human validator and in black, which you might not be able to see, the little regions in which they disagree. Now we thought, OK, this system, some people use it. It has the limitation, probably Guy has been the one trying to push it to more animals. I think maybe they are trying to use like 20 or 40. How far did you go? How many? 20. Yeah, so that's about the limit of this system. After 20, you need to be very patient to get the results. It's very slow and it might crash. So we thought a few years ago, why not to use convolutional neural networks? These are deep networks that instead of the typical classification all-to-all connections network, it has some convolutional layers. These are filters that are looking at the image in little patches and they are designed to be a translational invariant. And these are well known. I mean, everyone is using them. And OK, let's test the idea of whether convolutional neural networks can distinguish animals. So in general, we don't have a way for all animal species to find the nose, let's say, or the head. We can do it in zero fish. But we thought in general cannot be done. So all we do is to put the elongated region of the animal in this diagonal. It can be this way or that way, because we are not distinguishing head from tail. So what we do is we train the network with all these images, irrespective of their orientation. And we try to see by giving these inputs to the network whether it can learn some identities of some individuals. To test the idea, to obtain some ground truth, we obtain videos. The image is not very good. Of 184 animals, they are in independent little tanks. So we know who is who. As our ground truth, we have all these images for their 184 animals. It's 10,000 of them. So this is our library to do a test of any network. So how do we test it? We take 3,000 images to train the network, 2,700 for training, 300 images to stop the training when it can still give good predictions of who is who in a new data set, in this validation data set. And then we test in another data set of 300 images. For that test data set, we obtain for a single image the following accuracy. So this is single image accuracy. And that's the group size, the number of individuals that need to be identified. 2, 10, 30, 60, 80, 100, 150. ID tracker is somewhere here. And the new system with CNNs is much better. This is not to say that ID tracker was really bad here, because it is a single image. Between two crossings, you have many images. So you can estimate better who is who. But still, this means that for the new system, you need very few images between two crossings to know who is who. And as you can see, magically, it doesn't degrade that much for very large groups. OK, so this is a little proof that the system can cope in distinguishing 150 of my fish. Now, this is not to say that it would work in a video. And the reason for that is that we are training the system with 3,000 images. In a video, I'm not going to have a straight away 3,000 images for each individual. So I run into a huge problem. I need to obtain my 3,000 images per individual. I will explain how we do it, but this is for you to see the complexity of the problem. The animals are crossing very often. So it would be unlikely that I have a portion of the video in which I have 3,000 images per individual. By the way, this is the 30 day old zebrafish. OK, so the core of our tracking system is really here. It's a quality check procedure to gather together many training images. And I will run you through it so you can modify it by yourself for your needs. So after the preprocessing, I told you we have a crossing detector. What is a crossing detector is another network that what it does is to find in the video when 20 months are crossing. So we have some heuristics to find some of these crossings basically based on the number of pixels in the blob, whether they're going to form two blobs afterwards or they come from two blobs in the past. And then we have these as training data sets for crossings. And then we have also data set from the video obtained as individuals and another one for crossings. So we have another network. It's a bit simpler. But we train the network only on two outputs, whether it's a crossing or it's an individual. We use the training images that we obtain with our heuristics. And then the network will tell us in the video which images correspond to crossings and which two individuals. It's our test set that in our case it will be the whole video. And this is what you find. I'm here plotting in the y-axis my 100 animals in the video. And this is frame number. I will put in black the little fragments of the video in which my network is telling me that animals are crossings and in white when they are individuals. So this is the result of that crossing network. Okay, now what we need is to identify all the animals in between. And as I was telling you before, the way we do it is by carefully adding images for a training dataset. Let's start with the first part, which already will work for simple videos. For more complex videos it jumps into a new protocol for the super complicated ones that I never want to be in to protocol three. Okay, so the first thing to do is to find the portions of the video in which animals are not crossing. Typically in a video of our fish we have for 10 minutes 100 of those, some of them very short, but still 100. For example, by eye I could see here three. So I ask the system to find one of those and I ask the system to find the one in which the animal traveling the shortest travels the farthest in the video. So I find these global fragments and all I do is to train my network on those, on that data. So I have images for animal number one to up to 100. There are very few, but that's how I start. This is my training dataset. I keep something for validation so I don't over train. I train my network and I train it to identify all the individuals because they are not crossing. And then what I do is to ask how the network will assign the rest of the video. This is my test set. Now, it's going to assign the whole video but in general it's going to be shit. So I want a quality check. My quality check, I don't go into the details. It has an internal way of finding whether it's certain about the assignment. So it's a checking other thing that there are no two blobs identify the same animal. And out of that quality check I only retain those global fragments that have high quality. In this case are these ones here and my quality check also says, oh, are they covering a lot of the video? No, okay, I'm done. I don't like protocol one. You can still use it. The accuracy will not be close to 100%. Maybe it's like a 90% but you want fast tracking. This might be a way to do it but you will need to go into the code and change a bit one parameter in the quality check because in our case we don't consider it valid. It jumps into protocol two which is the one I really use for most of my videos. In that one all that happens is that I now use these global fragments as training data set for my network again. I train the network and I assign again the rest of the video. Now I have more global fragments. I've passed the quality check. They're still not covering the video as much as I want. So I ask the system to do it again and again. It does step three, four, five, and step nine. It has covered most of the video. The internal quality checks says, okay, it's enough of the video and we are done with the protocols. In the difficult cases we jump to protocol three but here for fish I never had that situation. It always was with protocol two. Okay, so we are here at step nine. Protocol three, I will mention only that sometimes the network cannot just move along the video and I have to force the convolutional part of the network to learn about the whole video and then only the classification part in a second step retrain it. So I force the network to see the whole video but most of the time we don't use it. Okay, so if I zoom in my video you will see that there are some little parts I didn't identify and also the crossings are still there. So we have another part of the algorithm that finds the assignation of those little fragments and also estimates accuracy. It's a conservative method to estimate accuracy. You don't see the histogram here very well. The mean value of the estimation is 99.95% accuracy in the identification of all the animals and when you have a human validator it goes up because it's a conservative measure that we are using. It goes to 99.997 so out of 10,000 frames you will have three problems with the identification and typically it's animals that are close and it's for one frame. Okay, so we learned that it works quite well but then we need to also solve the crossings so we have another algorithm to solve the crossings and once you go through that algorithm in the end you have oops, sorry, you have your trajectory for the tracking system. I'm going to show you two cases. The one I was showing to you before, our fish and the ants. In the case of ants it's a difficult case. It jumps actually to protocol three and the reason for that is that as you can see at the beginning of the video many of the ants are still so the training system only has basically one image and it doesn't have enough variability to learn more of the ants so I need to force the system to learn a bit but you will see at some point they start to move and for ID tracker this situation previous ID tracker was an impossible one with the new one is possible. Okay, so let me quickly tell you that the code is open source and has a nice GUI and documentation. Let me quickly move to a way of doing models using AI in a data driven way and what we want to do now is from our videos is to see how we can for example in this case predict the position of this animal given the positions and velocities are possibly accelerations of the other animals. So imagine that I want to predict the future of this animal and the animal is going to be here in one second and I want to predict the position and my prediction is going to be there. If you now ask a deep net to learn this problem it's actually a very difficult problem to learn and use a naive deep net it will not learn it but in 2017 there were improvements on deep nets that make it to learn so you can see the real and the predicted one. I'm going to tell you very quickly how this works. So the ideas are in this paper is a method to what's used to mimic physics and we are going to use it for modeling fish. So on the left you have the physics implementation on the right the deep net implementation and you can see that they work closely. Okay, so it works quite well if you look at the turning angle of the animal one second in the future against the accuracy of the system already works quite well with no neighbors because already by knowing that the animal is accelerating to the left you're going to know a bit of the future but if you add five 10 or 15 neighbors this network has a very high accuracy are predicting whether it's going to turn right or left close to 90% and that's the maximum we have achieved with any network. You can use this network for data analysis you can have to the right the probability of going the focal to the right you can find configurations in which the animal will go to the right but you can also because the model is an interpolation you can go there say okay I want George to have a different velocity a different position and I want to find out whether the focal animal is going to move to the right or to the left you can play with the model and get some intuition or what are the variables to consider but let me go through the theory so you can have a sense of what we are doing. So the network can be represented in this way now it looks like a little monster but I'm going to introduce it more slowly. So it has one object here which is a little network that is going to describe the interaction of my focal animal and another animal. I'm going to use a coordinate system center at the focal so the focal is at zero zero and with the velocity pointing in the y-axis. So the variables I'm going to consider for it are going to be the speed and the acceleration in the perpendicular direction to the speed. For the focal we've been trying different variables but the ones that seem to work quite well is the position of the focal respect sorry the position of the neighbor respect to the focal and the velocity as well respect to the focal velocity. Now the network we are using here in the simulation you just saw is a very rich one that has six variables as input but it will have 128 outputs and the reason to have many of those outputs is that this is just describing the interaction between two animals and we need some other part of the network to do that aggregation of all these interactions and for that we use this rich output from this network. So what we have after these little networks G is a sum of those little networks for my 15 individuals and then that has 128 outputs and I have another network that takes the 128 outputs and transforms it into a future position of my focal animal. Okay so that's a nightmare black book problem because basically you have 16 variables per neighbor that we were using 15 neighbors and there were two variables for the focal that's 92 variables and lots of parameters. Can we do better at getting some insight of what a model of the fish might be? We can but let's go slowly. One insight we are getting at this network working is that we are using first, it has some structure. It's not a network of all neurons, it has some structure. The beginning of the structure is that I'm describing the interaction between a pair of animals, one focal and another neighbor and the rest of the network is an aggregation of that. So let's try to mimic that network by using a simpler one that we can understand. The interaction one we can understand a bit. It has six variables but it's not super complicated. Instead of having 128 outputs I'm going to have a single one and I'm going to train that to be the probability that the focal animal goes to the right given the variables for the focal and the variables for that neighbor. So basically now I'm transforming six variables, these six inputs into a single output instead of 128. Let's plot that. So this is the result of the network. It has six variables so we needed to look at the six dimensional plot and I'm going to do it in the following way. The reason is not that difficult to look at six variables in this case because they mean different things. So one is the space, so the position of the neighbor in X, the position of the neighbor in Y. This is my focal, it has two variables. One is the speed, so if it has higher speed I have a large arrow and if it has acceleration to the right I will have an arrow to the right. I don't have an arrow to the right so the acceleration to the right is zero. And now I have these little diagrams and basically they are plotting the velocity of the neighbor. Here we have the angle of the velocity vector of the neighbor respect to the velocity vector of the focal and here is the speed. So these plots are giving you the probability that the focal animal goes to the right in red and to the left, to the north right in blue. Okay, so what does this mean? It means that if a neighbor is in this position of my plot, the neighbor is moving to the right and what this animal is going to do is just to move to the right. That's what this plot says. An animal here in a space with that velocity vector so high speed and moving to the right makes my focal animal to move to the right. If it's on the blue part, it means that the neighbor is moving to the left and it's again here moving a high speed so my animal moves to the left according to the plot. Now it gets more complicated and more interesting so there are some portions of the space in which my neighbor is not moving a high speed independently of what is moving to the left or to the right, my focal animal is moving to the right. So there are parts of the space in which the other animal, the neighbor, is not moving fast, it's only an attraction part. Here it's going to align but here it's only going to attract. There are regions, you cannot say it in this plot because the quality of the projector is not that good. So this part here was blue. You find it being white by it's blue which means that there is a repulsion of the neighbor to the focal when it's going at low velocity, the high velocity is not producing repulsions but a low velocity is producing repulsion. Now another variable was the velocity of my focal animal if I plot the same thing than before but for my focal going at low velocity you can see that it's only this region that is more important, there's still repulsion in some regions, repulsion is more important than before and it's more restricted in the space. So how it reacts to a neighbor depends also on the velocity of the focal and as well on the acceleration normal to the velocity. The animal is turning to the right. The evidence to move in the opposite side is to be at very high velocities. Only a neighbor at very high velocity pointing to the other side will make it change his direction but all the plot turns more red. Okay, so so far we've been looking at the simple interaction between focal and one neighbor before we had another function afterwards another network to take that into account. It was very complicated so now we are moving to a different network which is called attention network. I was studied recently. The idea is to keep the network we were talking about before by has now some weight so it's weighting the animals with this other function that needs to learn to produce a good probability of moving to the right or to the left. We found that this really depends mostly on four variables, the speed of the two animals and the position of the network. So I'm going to plot to you when the system learns to produce these weights which are relative weights, they are normalized weights. One obtains for the exponential of this new function something like this. So it's the same thing than before but now there is no dependency on the angle but it finds these regions of interest to weight the animal that are on those regions. So animals at high speed on these particular regions would be important for a focal animal at high speed. The way you read this is that if you have for example two of the 15 animals up there, their score is very high and if the other ones are lower it's very likely that these two win out and my focal is going to follow the direction of those two but depending on where the others are that might not be the case. If the others are also having lower but higher scores to move to the right in the end those will win out and my focal will move to the right because all these are relative weights. And again for low velocity the way you aggregate information is different you are doing it more locally and not with the far individual. Okay, so I have one more thing but probably I'm late, no? I'm late, do I have some time or four minutes? So we very quickly, yes, we don't have very good results for the next thing so I can be quick. So we are using the deep nets as I showed to you for our tracking system to build models in a data driven way but also in a top down way as I'm going to describe quickly here basically we are using agents that have a little brain and we are using reinforcement learning which is a method to change the parameters of the brain of the little network in a way that obeys our score we give to the system. So what we have in reinforcement learning is the idea that the world is in a particular state that is learned by the agent and the agent takes some actions that modify the environment and then the agent reacts again to the environment. So basically in this scenario you will have a world in which you have other agents. I produce a retina for my agent that has 160 values on a sphere in this particular case a very little brain of three layers and the actions you can take in 3D space are very simple here, the speed and two angles and afterwards there is some physics that describes the movement of the animal given the actions it has taken and the animal needs to learn to modify these nets to maximize our score we give to the system and this has like it was a nightmare to really use this agent type of agents before but now there are little tricks developed recently at the end of last year that makes this much faster. So the type of questions you can ask for example is does exist a sensory motor transformation so a particular set of parameter values that produces behavior X so now we are basically have a rich model that can be many types of models and we want the animal to do behavior X we say okay do behavior X and I'll find out whether it's possible to actually find a particular brain sensory motor transformation that produces behavior X as an example I'm going to show you not so good calculations on to the question of do we need to add a blind angle to this particular agent to have a milling pattern in behavior and the answer is well I don't really know but this is the answer numerically so numerically you can manage having a score that asks the animal gives a higher score if he's going to produce milling basically a high angular momentum and some attraction and some repulsion they learn the animals to be together and you have this I'm not sure I should call milling but they are rotating so they are still learning as we speak they are learning so they are still in their way to produce milling and I believe this type of modeling will be very important in the near future because there will be tricks that make this faster in better and better computers and now it's getting fairly easy to produce a world for them that is reasonable and with this I want to just finish by saying yes there is black magic there's black magic in all the modeling I do there's a bit more maybe in these systems because the trick is to add some structure to them that makes sense for the problem so they can learn from the data sometimes it will be black boxes we don't mind about black boxes to some extent for example for a tracking system we do mind for modeling behavior and in that case I propose that instead of using the best predictive networks we reduce their complexity in a way that we can still plot and understand them so we can keep knowing how far a network can go we can reduce the number of variables to be able to plot it and we can also go back to our models of one or two or three parameters and with that we finish just to acknowledge the people that did the work so there are really four people so Francisco Romero and Mattia Bergomi coded the tracking system Francisco Eras did the modeling work using interaction and attention networks and Robert Hins did the experiments with this and I'll finish thank you