 as usual, wait for just a second, okay? And we are, yes, and everybody seems to have come across from the last session. So welcome back everybody to the next talk of the day, the second talk of our final day that I see in our crowdcast schedule. They're the 20th session of the conference. So it is my pleasure and honor to introduce Rose Travis who's also, by the way, if you are following the conference on Twitter, she's doing some amazing live tweeting for us. We're gonna keep going with this theme of the role of data in science. So we'll be talking to us today about big data in behavioral biology. So without further ado, please take it away. Great, thanks Charles. And thanks for organizing the conference, but to you and to Luca, it's been really fun. And thanks to all those speakers. I've had a really good time and learned a lot about digital studies of science and also digital science. I'm gonna be talking more about the digital science side of things. I haven't done any kind of digital study of this aside from being in Zoom meetings, lots of Zoom meetings with the research group that I work with. So it's coming out of an ethnographic study, but at the same time this project in particular is kind of right at the beginning, I haven't done any formal work investigating it. So I'd be really happy to get feedback and ideas on the project in the question time. Yeah, so I'm gonna be talking about big data in behavioral biology. We can start off with a blackbird. And you might wonder, what does a blackbird have to do with the International Space Station? Probably not much in normal cases, but this blackbird has a small device attached to its back. That is a radio transmitter. It's sending signals of the bird's position coupled with a timestamp to a receiver station in the International Space Station, which is then sending that data back down to Earth to researchers in their labs. That blackbird is part of a really big international research project called Icarus, which has been running since for about 10 years now. It's an international cooperation looking at animal movement and animal movement of lots of different sorts of species. Part of the project has involved developing the technology to do this. So they've had teams working on miniaturizing radio transmitters and then also lots of teams out there attaching them to animals and looking at what happens. It's also because there's a receiver station in the International Space Station, it's funded by space agencies. It's funded by national organizations. It's a huge thing. And all of the data gets published on a database called MoveBank. That's been around since earlier than this project, but it's obviously received a huge input of data from this project. And I encourage you all to go and look at that database because for nothing else, there's a beautiful visualization of all of the data that's coming out of these animals. And that's a screenshot from it. So you can see what's going. So if I haven't made any visualizations, at least somebody has and you can get something out of it. What I find interesting about this project, apart from pretty graphics, is that there's a lot of interesting claims being made about the use of these devices and how they're changing behavioral biology. This is taken from the Icarus website from a site that is called The Internet of Animals, which raises already some interesting points about digital science and digital nature. But what it's about is the way that this project is changing the science. And there's some interesting claims here that I'm going to investigate in a little more detail during the presentation. So they say things like, in the future, these transmitters will be able to tell scientists much of what they previously found out only after hours of observation. The researcher in certain cases is sitting thousands of kilometers away in his or her research lab. The process, the progress that they've done in this technology, it facilitates completely new insights into nature and it allows tracking animal movement in real time and on a large scale with thousands of animals. So some of these points, things about researchers being thousands of kilometers away from these animals moving around, the ability of these devices to cut hours, to make things efficient, to deliver this information over to scientists so that they don't have to work so hard for it. And also to be able to generate new insights that maybe they wouldn't have been able to get with the regular methods. All of these claims are quite interesting and quite powerful claims. Obviously it's a little bit advertising speak, but it's also indicative of the way that biologists themselves tend to approach these technologies. So I wanna start this presentation by kind of tracking back a bit and looking at why these technologies are so interesting to biologists. And that's because they're such a big departure from traditional ways of doing, ways of studying behavior. Then I'll briefly introduce these technologies and look a bit into the kinds of benefits they deliver because they are really good for the behavior of biologists. But on the other hand, they're not maybe sometimes as good as some biologists would like to think they are. So I'll spend the final part of the presentation looking into how biologists deal with this movement data together with statisticians and how that means that these kinds of ideas of researchers sitting in labs just receiving movement data is too idealistic to start with naturalism in ethology. This is Conrad Lorenz and Nicotin Bergen, they're the founders of ethology, two of the founders. And they were well known for being great naturalists. They love to go out into nature and see the way that animals were behaving. And they use this method they kind of capitalized on this love of exploring natural behavior and develop that into their method in ethology. So they encourage their students to go out and to look at natural behavior and its natural causes and functions. That's what ethology was about in contrast to some of the other, the American ways of looking into animal psychology. They aimed to kind of do this really detailed observation of animals in order to develop a familiarity with the habits of the species, the one it usually tends to do. And only then they argued, could you start to figure out what the functions of different behaviors were, what their causes were, how they fit into the animal's way of life as a whole. There's a lot more I could say about this, but this is enough to get the picture of kind of where behavioral biology has come from, at least one of the places it's come from. And if you'd like to see some fun pictures, I do encourage you to look up Conrad Lawrence on Google images because he loved geese. That's the one thing. Okay, so the ethological approach sounds painstaking because it is painstaking, going and watching geese for hours on end, it's hard work. And in particular, it's not easy to get kind of a more objective method when you're just going and looking at animals. As a result, there's been a lot of ways of, a lot of tools that have been developed to study animal behavior in a way that's kind of objective in the sense of comparable across different individuals, different humans. One of the most important tools is ethograms, they're kind of lists of behaviors that have been selected for being kind of relevant to the species with a code. And that means even you're observing the animal, you sit there with this sheet of paper and you write down the codes with a time for whatever behavior the animal was doing. So this allows individual researchers to quickly record what animals are doing. But it also ensures that there's intercoder comparability and they do tests with that. So you sit there with somebody else and you both look at the same animal and you check that you're all coding the same stuff. There's also bird ringing, which is kind of, it's not really an ethological tool, but it's a way to track animal behavior and how they're going out into the world. So you attach an identifiable little plastic ring around their foot. And then people out there in the world who like watching birds, they'll tell you when they see it. And this helps you keep track of where animals have gone, where they've maybe settled in a territory, where they're migrating, these sorts of things. That's again, another way to kind of keep track of animals in a way that's a bit more easy than just looking with your bare eyes. Finally, more recently, people have developed software and R codes and things like this for processing videos. And this can help you to get like computer automated measurements for certain sorts of behaviors that are fairly simple, things like jumping distance or jumping angle of a grasshopper in this case. So these are all pretty powerful tools and some of them are more new, some of them are quite old. The tool that I'm most interested in, obviously is that these technologies used for tracking animals. There's various sorts of technologies that are used for tracking movement. There's radio transmitters, accelerometers, GPS tracking devices, and there are others. But these are probably the three most dominant things that people use. They've been used since the 1950s or 1960s actually to track animals. There's a really good history of this by Etienne Benson. But it's only in the last 20 years that they've started to be used on a broad scale. And that's because they've become, because of other technological developments, they've become much smaller, much more inexpensive. And it's been much more easy to get the data out of them remotely rather than having to find the animal again and kind of extract whatever information it's recorded in an internal program processor. So because of these massive changes in technology, suddenly you've got thousands and thousands of studies being done using these devices. And you can see there, some of them are really, really tiny, like the one that was on the blackbird at the start of the presentation, the size of your finger. Some of them are a little bit bigger, like this one that's on the fur seal pup there. You can see it's about the size of the palm of your hand. But both of these are designed then for like a particular organism so that it will fit it, a particular species. And it won't obstruct its behavior as it's moving around in the environment. There are a lot of benefits that you get from using these devices. Obviously one of the biggest things is that you can have a larger sample size. Because they're cheap, you can buy lots of these devices and all you need to have is the manpower to attach them to the animals. That sounds simple, it's not always that simple, obviously manpower in Antarctica is not quite as cheap as manpower in your backyard. But you can get much larger sample sizes regardless. You can go from say dozens of animals in a normal behavioral study where you're sitting there and watching animals, to thousands quite easily. You also tend not to lose the devices as easily as you would with say bird rings. With bird rings you can ring a lot of birds but you often lose them, you never see them again, you don't know what's happened to them. With radio transmitters say you attach it to the animal and it's constantly relaying information back to you even if a human doesn't see it again. What I think is more interesting benefit of these technologies is that you can get much more variety and that's not just kind of within a population because you've got a larger sample size you can also then capture more of the variation in a population but you can also get, you can study behavior under different sorts of conditions say it sounds trivial but you can't watch what an animal is doing when it's underground, when it's underwater or in a storm or in inclement weather unless you have kind of special expensive technology like night vision cameras and things like this. So these technologies once you attach them to the animals then you're able to track them in places and under conditions that aren't easy for humans to observe. Part of this then means that you can get different sorts of species. So in the past a lot of the behavioral studies were done on animals that were active during the day that were kind of easy to keep track of maybe they had a fairly small territory and these days you can do this kind of large scale tracking of animals of all sorts. So you can get studies of the fur seals for instance the mums go out for days on end and in the past nobody knew where they went. Now you attach trackers and you can find out where they're going to forage. This is then I think quite an interesting benefit because we were so used to as philosophers thinking about lab science where you're dealing with these tractable animals. Here you're looking at a different kind of broader set of species that are trackable as Benson puts it. The third advantage that I'll just cover quite quickly is that you can get longitudinal data. So in the past it was quite difficult to track the same animal over time unless you were able to ethically attach a label to it or a tag. These days with the trackers you put the tracker on and the tracker does the re-identification for you. And this longitudinal data is obviously very interesting for biologists who wanna find out about long-term migration or developmental patterns in movement, these sorts of things. The final benefit that biologists also like to talk about is that you get reduced bias. And there definitely is an element from these technologies of reducing human bias. I mentioned before that behavior biologists tended to work with ethograms. And even though these are, you do have intercoder comparability tests before you go out and use them to a large extent, they're still biased in the sense that it's humans working with them. So it's humans trying to identify behavior and humans are notoriously limited in terms of what they can see and what they pick out as a behavior to code. But also the code itself is biased in the sense that it's humans who've designed this code and picked out these behaviors as interesting and not added other behaviors that they don't find interesting or functional relevant, even though they might be in the end quite interesting or quite important for this particular species or this particular individual. Tracking devices then overcome these sources of bias because you're just getting whatever data that's being transmitted from this individual moving around in the world. You're not having a human there deciding which data points are the most relevant. These are sort of the benefits that biologists themselves talk about. You hear them talking about them in talks, at least I've heard them kind of getting really excited about all this new technology they've got and all the huge sample sizes and all this stuff that they can get with this technology. As philosophers or scholars of science, we can kind of translate these direct benefits into benefits in terms of scientific values or virtues. So scientists or behavior biologists with these tools, they're getting greater significance and greater representativeness in their samples. They're also potentially going to be able to get more novelty and objectivity. And it's these two at the bottom that I'm more interested in. Is it really true that these devices help behavior biologists to get more novel results in the sense of results that aren't predetermined by theoretical assumptions or human biases? And then as a correlate of that, are these technologies more objective or do they give us a more objective science? To answer those questions, we have to look a bit closer at what movement data looks like and how biologists deal with it. I mentioned at the beginning that the sorts of data that these kinds of trackers and transmitters are producing, it's a position or acceleration data point coupled with a timestamp. So at certain time intervals, this device sends an information about either where the animal is or how fast it's accelerating depending on what device it is plus the time. And the way the data sets look is something like this. This is data provided to me by Roland Langrock. He's a statistician at Bielefeld University in the group that I work with. And he deals with animal movement behavior. He's a statistician, but he deals with primarily just with this data. The first example is of a shark and it's being tracked with an accelerometer. So that tracks how fast it's accelerating. On the x-axis is time, like several, many, I think it's weeks. And on the y-axis, that's its acceleration. And you can see that it's quite complex. There's very low acceleration, there's very high acceleration and there seems to be some sort of fluctuation. But generally just it looks kind of messy to me at least. The second example is an elephant track. And this is a location track. I think it was a GPS track. And you can see, so this would be like a 2D projection onto a map. So you can put a map underneath it and then you can see where the elephant is going in that territory. The things that you can identify kind of as a naive person from this is that there's parts where it's kind of, the elephant is just moving around in a little squiggle there. And then there's parts where it's taking long straight lines or kind of it's traveling further distances. That's at least what it looks like. As well as the movement that you've often got other sorts of data, environmental climate data, as well as often information on the track individuals like their age, sex, condition, phenotype. And sometimes if you've taken a blood sample you've got their genotype as well. What do you do with this kind of messy data? You look for patterns. And because it's so complex, this sort of data you often then go and ask statisticians to help. One of the most common ways to analyze movement data is using hidden Markov models. There are other ways, but this is quite a dominant way to analyze this data. And it's also the way that this data that I showed in the previous slide was analyzed. So I'll look at this. Take the example of the shark. This is the same data now the colors are shown. And you can also see the model that's being produced from the data. I won't go into too much detail because it does get quite technical. But what we can see is that the model has split up the data into three states. There's a blue state which is low acceleration, a yellow state which is medium acceleration, and then a green state which is high acceleration. So the model has kind of decided that this is the best way to represent this data based on these three states that the shark can be in. And then what you can do is you can look at the likelihood that that shark will be in one of these states depending on the time of day. So seeing if there's a circadian rhythm in the shark's movement. And then you can see that it looks like there is a bit of a rhythm. This high energetic movement seems to only occur in the early morning and in the evening. Whereas the medium acceleration kind of also seems to happen most often. And you can sort of see that from the data as well. Now, what's interesting about this sort of model is that it delivers these three states. And you can see on the graph that they're called cruising, active swimming, and high energetic movement. But what it's actually delivering is low acceleration, medium acceleration, and high acceleration states. So it's up to the researchers to decide what these states mean. And here they've decided it's cruising, active swimming, and some sort of high energetic movement yet to be decided what exactly that is. So this is one of the major challenges with dealing with this data and dealing with these sorts of models is interpreting what the models are telling you. What are these states? Is high energetic movement foraging or hunting? Is it perhaps mating? Or traveling? This is not clear to a statistician dealing with the data. For that, you need a biologist who knows what sharks do. I'm not a biologist. I don't know what sharks do. But I imagine it's something like hunting, I think, in this case. But what's interesting is that you need this kind of biological background of knowledge of what's typical for shark behavior. You see this also in the case of the elephants, where there are similarly three states. One is a low movement state, which they've labeled resting. There's a medium movement state, which is they've called area-respirated search. So it's kind of moving, but not doing more than resting. But it's not traveling. And traveling is the third state. This is a classic way to label these three states for terrestrial animals, actually. And it's up to biologists then to say, well, in this area-restricted search state for an elephant, what does that mean? Maybe it means it's kind of going through a farmer's field and destroying all the crops and picking out the corn that it likes. What's then interesting in particular about this model is that it demonstrates a second challenge for dealing with movement data and that's selecting variables. A statistician probably wouldn't choose to plot these kinds of state probabilities against temperature, but the biologists were interested in how these increasing temperatures affect elephant herds. And what they found was that high temperatures, quite surprisingly, elephants were more likely to travel to engage in these kinds of long-distance walks. And what the biologists then inferred is that they're looking for water. And probably the biologists were out there also looking for water on a hot day and observing how water got scarcer as the temperatures increased. What I think these sorts of examples show us is that analyzing movement data needs more than statistics. You can download the data off the internet. You can run a HMM, but you're going to have to have some sort of background behavioral knowledge about the species or about a closely related species in order to interpret what that model is telling you, as well as to then kind of investigate it in more detail in terms of causal explanations or the function of the behavior. You also often need theoretical assumptions. And that wasn't super clear in the examples that I showed. But you can see it in some other examples where there's an assumption that what the animals are doing there is going to be adaptive. Maybe you've got optimal foraging theory in mind. And that shapes them the way that you interpret the data that you're seeing and the way you explain it as well. And both of these mean that we can think back to these claims about the revolutionary nature of these technologies. It's not quite clear that we are going to be getting entirely novel results if we're still relying on theoretical assumptions to interpret these data. And it's also we're not entirely departing from some of the biases that we already had because we're feeding in behavioral observations, our own behavioral observations, back into our interpretation of the models. And this then feeds our own biases back in. So what do we see as the important things that they're doing there? It's foraging, it's meeting, things like this. Who knows? Maybe they're doing something completely different that we haven't noticed. What can Icarus learn from what I've just been talking about? I think that it's pretty clear that radio transmitters can't tell scientists much on their own. The data that it delivers must be interpreted. And that interpretation that requires not just researchers at their desks doing stats, but also researchers in the field and in the lab looking at animals, dealing with animals in their day to day life. And both of these need theory, of course, as well. As a consequence, then it's not entirely likely that we're going to get completely new insights that overturn theory or overturn the initial beliefs that we had about what animals were doing. Of course, you do get some new insights, especially into the different sorts of variation that there are. But you have to be cautious about these sorts of claims, I think. So apart from the irony that this project is called Icarus, I think there's something else we can take away from this as philosophers. And these sorts of claims about big data, I mean, we've investigated them in lab contexts a lot of the time, things or human biology, or things like immunology or large-scale projects in epigenetics say, what's been done less is looking at big data in field sciences, like tracking animal movement. And it's interesting that here you see the same sorts of things coming up where you need this practical, maybe embedded knowledge of the organisms that you're dealing with in order to interpret the data and use the data for anything. And with that, I would like to finish up. This is the research team that I've been working with, the NC cubed group. And that's where I got a lot of the insights from the talk today. So I'd like to thank them and also, obviously, the DFG for funding me and to you all for listening. Very cool example. Thanks so much. Let me do my usual and ask a question once now while we wait for everybody to catch up with the end of the broadcast. So I wanted to ask about, you mentioned that, I mean, obviously you know some of these biologists well. You're engaging with them regularly. So I wonder what's their attitude toward these theoretical presuppositions, I guess, toward their relationship toward theory? I guess it's probably the best brief way to put it. Because this is a classic thing that we sometimes hear about that there's a kind of distrust that perhaps the reason that they're presenting their work this way is that they want to show themselves as not being engaged in a field. We're just reading off the data, right? Not being engaged in a theory-laden enterprise. So I wonder what that's been like in your discussions with them. Yeah, that's a good question. I haven't explicitly gone into these questions say about adaptationism with them. That's kind of future work that I'll be doing, hopefully. But I am fortunate to be in a team where there is quite a lot of interest in theory and in philosophy, actually. So they're quite happy to engage with me about discussion and discussions about kind of much more theoretical pie in the sky sort of topics. So my thesis is on individuality. And now we had a lot of discussions about these sorts of theoretical ideas to do with individuality. And I would ask critical questions. And sometimes they do get a bit uncomfortable with it. But there was less of this sort of, and potentially so that I am in a German context, there was less hesitancy to rely on theory. And I do think it does have something to do with the German context, actually. That's an interesting, yeah, the impact of broader scientific cultures here. That's a cool angle that I hadn't thought about. And of course, it's relevant for a project like Icarus that's completely, obviously, I assume, totally international, right? Yeah, although it is hosted by the Max Planck Institute for Anthology in Germany. But the main researcher there, he was working in the US for a long time as well, so, yeah. Sure, wide, wide network. Let me pick up on another element here. So I'll try to think about how to phrase this as well. So I wanted to pick, so you mentioned that there's all this data is publicly available. And so I wonder, is there a significant environment of reuse around this data? What have you been able to pick up on that? I'm picking up on, I'm sort of riffing on Sabina's talk from earlier. Is there a community built around the utilization of this kind of data? Can you tell from working with the biologist that you've talked to? And that's another good question. I haven't seen that with the biologists I've been working. Like they're much more engaged with producing the data. And I think also because of some of the stuff I talked about in the talk, reuse is going to be quite challenging for a lot of these researchers. They're produced not only because you need this kind of background knowledge of what the species normally does, because because there's such variety, you have to know something about these species. And often you don't, as a behavior biologist, you know about some sort of species, but not about these other things. But apart from that, there's also the issue that a lot of this data is being generated in very specific kinds of situations. That said, I think that there must be people reusing this data, or they wouldn't bother to put it up. And especially, I think, with the migration data, that's where things start to get interesting because people look at them in terms of the spreading of diseases or kind of changing land use with changing agricultural patterns, these sorts of things. So yeah, it's something actually that I should look into in the future researchers, like how this data is traveling into other contexts. Question coming in from Sarah Davies, who asks, who says, what a fantastic case. I really like the idea of looking at data creation and management in different kinds of spaces. So I'm also reminded of the work on metadata management that we heard about the other day. So how do your informants deal with this? How much data cleaning, data care, data curation? Do these data need? And also, where's this done? Does that kind of work happen out in the field? Or is that happening back in the lab? Yeah, that's good. Good question. Thanks. I don't know too much about how much, like the extent to which they have to clean this data. I imagine there's a fair bit of cleaning that goes on, although it is often quite simple data location plus timestamp. It would usually happen in a lab. So when you go out into the field, often you don't have a good internet connection. So you're not actually receiving the data as it's being produced unless you go attach the transmitters, then go straight back to your lab and watch it happen. So these people in Antarctica, they spend six months down there, and only when they get back do they finally get all the results that they had been accumulating and start to look into that data. So yeah, but obviously it depends on your study system as well, where you are and what you're doing. Serious delayed gratification for us for Antarctic work. That's impressive. Let's see. Ali Belovil writes, I like what you say about theory informings to the statistics and the models that you use. Can you elaborate on your experience working with a statistician? So is there a back and forth process between the questions that you want to ask, the data you have available, the statistics that are available to be used? So I haven't really been working with Roland. I mean, I've had conversations with Roland. And he has. So this is kind of actually where I got the idea for most of the stuff of this talk. He's told me a lot about how he as a statistician has no idea about biology. He really like he has no training. He was asking me at the start of the research group, what's a phenotype? Like, can you explain to me what's a phenotype? I don't understand. So what's been interesting is looking at these challenges in communication between statisticians and biologists. And this difficulty, because obviously biologists, then they don't know what a HMM is. They have to kind of like figure out what it is that this model is doing. And what are these states and what is that telling me? So you see it in really interesting like learning process between these different communities. And obviously I've learned a whole lot because I didn't know what HMM was either. Not well, not really. So that's been I think that's one of the most interesting things about these interdisciplinary communications is the learning that takes place. And then also some of the challenges that they have to deal with in the process. Just to follow up on that, this is for my for my certification. So the group that you're working with, it is it's all of these people mixed together in a in a big research cluster. That's really cool. That's very cool. OK, next question coming in from Christoph Maltaire, who writes a fascinating area of research. So you mentioned two limitations concerning interpreting the data and selecting the variables. I'm wondering also about the role of choice of model type. So for example, HMM versus versus other kinds of models or choice of like hyperparameters for fitting the models. So have you talked at all about your thoughts on that for either this particular case study or more generally? Yeah, that's a good question. I don't know too much about what sorts of thought processes go into choosing the models. I have a feeling it's like at least in the group that I'm dealing with, the statistician is quite familiar with dealing with HMMs. And it is one of the most common ways to deal with movement, movement data, because it's a way of kind of modeling steps. And that's often what you've got is you've got step data. But obviously, they can do things like linear regression. And biologists are more familiar with doing linear regression. So maybe they'll try that out first. And if it's proving too difficult, they'll go to the statistician. And the statistician will say, here's this tool that I've got, let's try it out. So I have a feeling it's more in terms of tools and what sort of tools are available than in terms of what would be the best for this data set in terms of very considered processes. I mean, people have talked about this kind of tool-based use of stats in other cases as well. Yeah, as you say, that fits with a sort of toolbox conception of biological modeling that you hear about fairly frequently in our literatures at least. Yeah, for sure. Let me see. With that, I think we're running out of questions. And I don't have anything else to add. So let me just wait for, again, just to give people a little bit more chance to type in something into the box if you're interested. We are almost out of time, so that's not a problem. But if anybody has anything they want to add. All right, I think we can go ahead and wrap it up there then. Thank you so much. Really, really cool example. I'm interested to see where the work keeps going. And I'm going to go look at all the fun movement data online soon, too. That's really neat. Cool, thanks. Fantastic. Thanks so much. And we'll see everybody back here in a few minutes. It's one of our slightly longer breaks. So we'll see you in a few minutes for the next talk. Thanks so much. See you.