 So, since everyone's here and I think it's 4.15 without further ado, I'll start the session. So, hi everyone, welcome. I'm Ulfa Maria. I currently work as the head of the Science Unit and Data Analyst for Wildlife Conservation Society in the Nature Program. And so, welcome to the Ecology and Environmental Sciences session. Today we will have Tim Ofei, Sam Sonoff from Faculty of Geography, Lomonasov, Moscow State University, and then we have Liam Daniel Bailey from EZWay Berlin. So for all the audiences here, you're all welcome to interact by leaving messages in the corresponding channel in Slack. If you don't know yet, you can type in the channel and search for hashtag talk, underscore ecology underscore environment, and then just enter and then you can post any question there, so I'll be checking. And you're also welcome to put your question in the Q&A and then vote for the questions as well. And then just as a reminder, each talk will have 20-minute slot with 15-minute speaking and the rest for questions. And so, due to time limitations, we can only take at maximum four questions per speaker. Hello, everyone. First I will put the link to the package into our chat. So here's the link and I would like to begin with some short demonstration of the final output that can be achieved using the package. So GRWAT package is being developed for automatic analysis and separation of river hydrograph. GRWAT stands for Ground Water. It's an old name of the old program that was written for the purpose. And the river hydrograph is one of the most important characteristics. So river discharge. And one of the crucial tasks when analyzing hydrograph is separation of its curve into quick flow and base flow components and revealing the genesis of each quick flow event. Here on the screen we can see the report produced by GRWAT package. It consists of several main areas which first describe the shape of the river hydrograph for different years. It also contains multiple plots that bring the light to the interanimal changes of the main hydrograph characteristics such as anal ground water runoff, soil flow runoff, number of days of thaws, and a month of minimal monthly discharge during the summer. Mainly the package automates the routine operations that raise during the analysis of river hydrograph and provides the specialist with the useful tool to compare the changes in river hydrograph and climatic changes to provide a statistical test that will how significant these changes are. The third area of the report is long term changes which are calculated by subdivision of the anal discharge values into two periods separated by some selected year or calculated by statistical tests. And the final area is the table that provides the summer information about several dozens of characteristics calculated for river discharge and related variables. Here we can see the name of the variable, the change year which is revealed by PT tests and various statistics which indicate if the changes are significant between two periods, one calculated before change year and one calculated after change year. So such report is calculated just from river discharge data and from companion meteorological data. And now I would like to proceed with the underlying functions that are contained in the package to bring some light on its inner functionality. So I will proceed with my presentation, I hope that it will, yes, I see it's here. So I will talk about GerWat package which was developed for automated. My presentation will mainly follow the main vignette content in the package and we have included some testing data inside the package and for this case study I will use some standard packages commonly applied in special data analysis such as SF, MAPU and also TIDiverse for general manipulations and plotting of the data. So the package is loaded just right here. The testing data contained in the package is spazgore gorge on protva river. So we begin with loading this data and we can see that it's just a very simple data frame containing the data of the observation and the value of river discharge on this gorge. GerWat is not sensible to the format of the data so having the date column or having the multiple columns contained in the day, month and year are both perfect for further analysis. One of the common problems is joining the river data and meteorological data which is not always available at the gorge. So if we have to collect the meteorological data such as temperature and precipitation from external sources we have to use special join. For special joining we must have some region be it buffer zone or basin of the gorge. Now for demonstrational purposes we also have basin which is related to this river gorge. Data is also included into the package and can be downloaded from using such operation here. Usually it's quite correct to buffer the initial special data to obtain meteorological variables because some auto correlation exists within some distance. We have the geofunction which can buffer polygonal or point data using some optimal projection. So it behaves pretty similar in polar or equatorial regions. And in our further analysis we will use these red zones to extract real analysis data and join it with hydrological time series. There are multiple modern reanalysis data sets available in the web. One of the latest is Aero5 reanalysis which was recently extended to some earlier years. We have some reanalysis that is included inside the package and it has a daily resolution. So it was resampled from our observations. And it can be downloaded using this link to test the functionality of the package. These files are pretty large so we do not include them into the distribution. It covers the East European territory of Russia but later we hope to extend its coverage and also to add some functions to add Aero5 joining to the data. This is the standard workflow for using the reanalysis. We have some specialized function, a juror redran, which reads the precipitation temperature data sets and then automatically joins it to the hydrological data using the polygonal buffer we created earlier. So the resultant data set looks like here. So there is a date, discharge, temperature and precipitation for each day. Now this is the map of the special configuration. These small dots are the centers of the cells of reanalysis data and these large black dots are those which were selected for joining the hydrological data. So changing the polygon we will obtain different reanalysis data here. The juror word contains some functions to fill the gaps in the data. This can be done using the maximum size of empty data in days or using automatically calculated autocorrelation function. So if we set the smallest possible autocorrelation to 0.7 then we can calculate the number of days during which these autocorrelation is correct. And then juror word will fill the gaps in the data which does not exceed this value using the simple linear interpolation. Next let's proceed to the most interesting part of the package. There are some functions to reveal quick flow and base flow in the data and also to reveal some genetic components. First the function get base flow simply extracts the base flow component which is mapped in red color here. Sorry for acrylic, this is a location automatically localized labels. The standard method is line-holic method and we can simply just plot the results using ggplot graphics. GR base flow can be parameterized here is example of increasing the number of iterations and smoothing parameter. And there are also multiple other methods of calculation, including Bowdoin, Maxwell-Jakeman and other methods. And here we can see that the results of calculations are different and each of those can be parameterized by its own set of parameters. Advanced separation includes revealing their genetic components. Genetic components show the genesis of each quick flow event. And to define if the quick flow was influenced by a rain or snow melting event, we must include some external meteorological data. And that's why we were joining it at earlier stages. GR help params function includes some help to reveal the meaning of multiple parameters used for advanced separation. These parameters are a large number, so we used their representation as list. And this list is obtained using GR get params function. These parameters are different for plain and mountainous regions. And they can be set using just accessing the list elements. After the separation parameters are set, we can proceed with separation of the hydrograph. This is simply done by GR separate function. And after the hydrograph is separated, it can be summarized in a set of various variables. So the interanial changes of the hydrograph and summarize its behavior during each year of observation, each year of observation. And plot and plot and testing functions are dedicated to graphical and numerical representation of different behavior that can be seen in multi-year changes of hydrograph. And graphical functions are mainly based on ggplot graphics. So this is an example of using the ggplot set function, which is dedicated to automatic plotting of results of separation. On this plot, we can see the four types of discharge. The main is ground and also rain, thaw, and seasonal, which is the main thaw discharge, which usually appears during the spring. Interanial change variables are computed using our summarize function, and their meaning can be explained using GR help verse function. GR test verse performed a statistical testing, which mainly aims at revealing some statistically significant interanial trend changes. Long term changes are calculated using some characteristic year. And these variables can be plotted using dedicated functions, which allow you to visualize them in such way. So every plot contains the initial time series. It contains also the trend line, which is solid when the trend is significant and dashed when the trend is not significant. And also the change year, which is revealed using the PT test. And resulting reports can be generated using the GR report function. That's all of here the last slide. I hope that I need that. I need some update my presentation here. So generally, it's all generally it's all the package is not released on the ground so it's at the stage of active development but we will be very gladful to receive any feedback on its functionality or some problems during it. And I hope that it will be available for hydrologists and all who work with hydrological data and their analysis. Right. Thank you, Timothy. Can we ask, can we take questions now. Yes, yes. Cool. So the first question is from Shimela is a bit digging. The question is what is the buffer distance limitation and why. Generally buffer distance limitation relates to special auto correlation. We don't have some specialized functions to reveal the the most reliable buffer buffer and distance, but generally it can be calculated by using some analysis points and analyzing how the correlation decreases with increasing the distance. So you set some threshold value, for example, 0.8 and see at which distance the actual correlation between real analysis points decrease below this value. So this will be your maximum buffer distance. Thank you. Now there's one from Michael Stortzel. The straight base flow lines look really strange. Have you have any evidence that the amount of stream flow components is correctly calculated. For example, is a top separation. So, okay, can you can you repeat the question. So the question is about trend lines. The straight base flow lines. So the amount. Yeah, yeah, yeah, yeah, yeah. Yeah, I understand. So this is one of the approaches which was historically actively used in Russian hydrological school. And we know that it's it is very questionable, questionable. So, in fact, the package will be extended with multiple functions that reveal the trend of the ground flow between different events. And such straight line is not always reliable. And in many cases, some smoothing curve which is closer to base flow separation like in line holic algorithm is better suited to feed the ground flow. Thank you team of a one more question. Do all methodological approach bring the same result. If so, is it possible to be used I assume as verification approach approach. So I need to specify the question. So, is it about different quick flow discharge extraction algorithms, or some other. What are we first to page 18 to page 18. Okay, maybe. So on page 18 we see the internal change variables, which are calculated for each year. So, here the hydrograph is a daily data and internal change and change variables are yearly data. So it's summarized and some aggregated values are calculated for separations based on this data. The main degree of freedom in method in applying different methodological approaches to parameterize the separation to use different algorithms for extraction of ground component. And to, I think that's also concerning to verification. We don't have such data, but we can compare the actual separation with the methodological data, which can be obtained from weather stations. Thank you very much. Have a question, but I think we ran out of time. So, thank you to Moffay for your talk. We will talk later in the channel as well. So we will go now to Liam. So Liam, please have a go. Hi, everyone. My name is Liam Bailey. I'm a postdoc at the Lightness Institute for Zoo and Wildlife Research in Berlin. And today I'm going to be talking to you about agent based modeling in R with a particular focus on the R6 package and object oriented programming. So what is agent based modeling. Agent based modeling is a simulation technique. We simulate the interaction of different agents, or you could say individuals, and from the simulation we're then able to observe more complex emergent properties of the system that we're studying. And these emergent properties are often difficult, if not impossible to simulate individually. So agent based modeling is great for really complicated or complex systems. It's used across a wide range of fields, including to model things like traffic behavior disease outbreaks. And in my area of research to better understand complex animal systems. So, to give a bit more of an example of an agent based model, let's imagine we have a population of animals. Each one of the individuals in our population would be considered an agent and they have some attributes like an agent and a sex. And we then simulate these agents interacting with one another. And the way the agents interact can be informed by our biological knowledge what we've observed in the wild. And then after we run our simulation, we're able to observe some more complex emergent properties, say a change in population size over time. So you might be thinking that agent based modeling is useful for work that you do. And so how would you actually get started on this? Well, if you look at the number of different agent based modeling tools out there, there's really a huge variety. This gift just shows a list of a few of the possibilities that's listed on Wikipedia, but it's by no means exhaustive. And many of the softwares that are available are free and open source. So it really makes the decision quite hard, which one you want to use. Some of the major platforms that are used in the biological sciences at least are NetLogo, Swarm and Repass. All of these are really powerful tools for agent based modeling. But one thing they all have in common is that they don't use R. NetLogo has its own NetLogo language, while Swarm uses Java and Repass can be in both Java and C++. Now, of course, working in a language other than R isn't inherently bad. It's not problematic in itself. But as many of you might know, researchers are often taught R during their bachelor or undergraduate. So this is a language that they're more familiar with. And if they need to do agent based modeling using these type of tools, then we might end up with a workflow that looks something like this. We start out doing some data wrangling, get some data on the animals or the system that you're studying and have to clean it and organize it. And we might use Deeply or a data table. After this, you might want to fit some statistical models to better understand how your agents might interact. And you could do, say, mixed effects models with spam or GAMs using MTCV. And after this point, you then need to move into your agent based modeling tool. So you move out of R into whatever language it is that's appropriate for the tool that you're using. And in this tool, you'll then need to do things like documentation and unit testing. At some point, you would have some results. And at this point, many people who are familiar with R will probably jump back into R and plot their results, visualize their results using something like ggplot. And so this requirement to move from a familiar language into a less familiar one might provide a bit of a hurdle and it may discourage researchers who could benefit from using agent based modeling. And I really think that's a bit of a shame because it's quite a powerful tool. So this is where R6 comes in. R6 is a package available on Cran, which allows us to program using encapsulated object oriented programming in R. And this system in encapsulated object oriented programming is already available in Java, C++ and NetLogo. And that's likely why these are systems that have been used previously for agent based modeling. The package is created by Winston Chang and he actually presented R6 for the first time at the USAR conference in Brussels in 2017. If you're interested in learning a bit more about the package, I would really recommend that you find his talk on YouTube and it gives a really good introduction to the package and how it works. R6 is a real foundational package of the art community. It has over 350 reverse dependencies, including major packages like deep player and shiny. So how would we use this? How does it work and how would we use it in agent based modeling? So within encapsulated object oriented programming, each object that we create has attributes contained within it. So these are characteristics of the object. So this might be things like its sex or its age if we look at this object down the bottom here. And it also contains methods. These are things that the object can do. So in our animal object here it could grow old, its age could increase, or it could have a baby. It could create a new animal. Writing R6 is really simple. It has quite a simple syntax and this was intentional to improve upon the native object oriented system of reference classes that's native within R. For the work that we've done, we found that it's quite fast, at least fast enough for our purposes. And so all of these factors means that it's been quite good for the work that we've wanted to do. Because R6 is an R package, it means that we can use it with other R development tools that you might already be familiar with. So for example, we can build our model within an R Studio project. We can document our model using R oxygen as a version seven, and we can unit test everything using something like test that. So all of these features means that using R6 is both great conceptually and practically for agent based modeling. And we found it to be a really great system for the work that we've been doing. And hopefully that means if people can begin to use R6 for agent based models that we can move from a workflow like this to one that looks more like this where people can stay within a language that they're familiar with through the whole process. So it's all good and well to talk about this in theory, but I also wanted to give you a bit of a practical example of agent based modeling with R6. Before I go into some live coding, I just want to point out that everything that I'm going to present now is available on our public repo on GitHub. And that's abm R6. You can see the link down the bottom of the slide. So if you're interested to learn more, please feel free to check out our repo. So the example I'm going to talk about today is the pep and moth. This is a classic evolutionary example. We're dealing with a species that can occur as a both a white moth here on the right and a black moth here on the left. And in its natural environment, like that in the top of this cartoon here, the pep and moth occurs on trees that have quite light colored bark. And so in these circumstances, you can observe many more white moths than black moths. But during the Industrial Revolution, naturalists began to notice that trees were becoming darker due to all the pollution. And on these dark trees, black moths were becoming much more frequent than white moths. And we now know that this process is driven by natural selection on a white tree, a white moth is better camouflaged. And so it is safe from predators. A predator like this bird is much more likely to see a black moth. And in contrast on a black tree, a white moth will be much more obvious and much more likely to be eaten by bird. So we wanted to build an agent based model that could recreate this natural phenomenon that has been observed. And so to do this, we built a model using our six that followed three simple rules. Firstly, we made it that moths tend to do moths will do better when their color matches that of the tree in which they live. Secondly, we allowed new moths to have a different color to that of their parents based on a given mutation rate that we define in the model. And finally, we made it that the world will change color at a given frequency. So in other words, the trees will change their bark from white to black and back to white again. And this mimics the phenomenon of industrialization and then eventually the removal of many of the pollutants that were causing the darkening of the bark. And hopefully with this model, we can then recreate the patterns of frequency, the change in the frequency of black and white moths within the system. So here is our studio project in which we built our agent based model. We've built this as an our package and we did this intentionally because we think it best demonstrates a lot of the power of working with our six, including things like documentation and unit testing. We have a read me which includes a basic example that you can work through so please feel free to check that out. Now to start off with, we're just going to run a simulation and then I'll show you how the simulation works and what's going on inside. We can open up the documentation here. This is an R6 class that has been fully documented using our oxygen. I'm not going to talk through all of the attributes and methods here, but I'll just move down to the examples and we'll run the first example. So this will be a simulation. We're going to initialize a simulation. We can initialize it using this dollar sign new and this simulation will run for 200 years. It will have, let's say 100 moths just to make it run a bit quicker. And we've defined here the mutation rate. So the probability that a moth will be a different color to its parents and the period. This is the frequency with which the world will change color. So we've now initialized our simulation object. We now want to run it and to show you what's working on the inside. I'm going to pause the simulation after 50 years and then we'll be able to look at the internal workings of the simulation. So we're now inside the simulation objects and we can look at some of the elements within there. Firstly, we can look at the world object, which is within the simulation. And this is itself an R6 class and this has an attribute color, which is whether the tree is black or white, and it has a number of other methods and attributes. Now, of course, we can see this here, but a more convenient way to do this would be to look at the help documentation for our world object. As you can see, it has the color attribute, it has the period attribute, the frequency with which it will change color, and it has a time attribute. This just allows us to progress the simulation over time. I won't go into all the methods, but you can see that each one of the methods of the object has its own section of documentation with its own example. And then if we go down to the very bottom, all of the examples are grouped together here. So that's the world that the moths are living in. And then the next thing we can look at is the moths themselves. We have a population and this population contains a number of individuals. And we see here that it's a list of 100 different moths. We can look then at the help documentation for the moths. Here we go here. And again, we can see that the moths have a number of attributes that are of interest. They have a color, black and white, and a mutation rate. So this is how the model is structured. We have a simulation containing a world, a population, and the population is made up of multiple moths. So I'll continue running the simulation now. We can get to the end and we can look at the result. We can use the plot method, which is a method for the simulation. And we now have our first result from our agent-based model. So in this plot, we have the frequency of black. I'll just maximize this here. We've got the frequency of black here on the y-axis. So one is black, zero is white. And we have time on the x-axis. The gray line in the back is the color of the world. So the world begins as black. It then becomes white. It changes back to black. And it fluctuates like this over the full-time period of the simulation. And then the blue line shows the frequency of black in our moths, the population of 100 moths that we've simulated. And you'll notice that it follows the pattern that I described earlier, where when the world is black, the frequency of black moths is higher. And when the world is white, the frequency of white moths tends to be higher. And it's important to point out here that we didn't explicitly include any relationship between the moth coloration and the time. So the pattern that we're seeing here is simply a function of these simple rules that I described earlier on when I introduced the simulation. So one great thing about working with an agent-based model is that we can take our simulation and we can run it with slightly different parameters and see how that will affect our emergent properties. So here, let's say, again, we'll just run with 100 moths. And we've now made our mutation rate lower. So the probability that a moth will be a different color to that of its parents has been reduced. If we run this and plot it, you'll see we now get quite a different result where at some point, all of the moths become white and it takes a long time before they move back to having black moths in the population. And this is simply a function of a lower mutation rate. And so you can see the power of agent-based models allows us to tinker with the system and see how it affects the emergent properties. So that's all from me for today. Thank you very much for listening. As I said, this code is available online on our GitHub repo. So please feel free to check it out if you want to learn more. And I'm happy to take some questions. Thank you. Your example is on moths. Is there any examples on big mammals? Yes. So that's actually what I'm working on now for my research. So we work with long-term data on hyenas, a population in Tanzania. And we've built an agent-based model that can recreate the complex social system of that species because it has a very unusual hierarchy with an alpha female and male dispersal and a whole lot of quite unusual elements. So that's in the works at the moment. We're working on a paper with that. So hopefully that will be available for people soon. Cool. One more question. So the variables that you use for the agent-based model, this one is for you use the mutation rate of the moths, for example. How can you adjust for other variables? Like, for example, the hyenas one, you have this complex interaction. How do you include that within the model? Yeah. So I mean, of course, you can continue adding attributes that you can tweak the normal way you would approach this. If you had, say, like here, I showed that we changed the mutation rate and it changes the properties. And then you might have multiple variables that you want to change and see how they change the properties. And the common approach there would do something like a sensitivity analysis where basically you would take each variable, you would vary it by a certain amount, and then you would quantify how that affects the emergent property in the end. And then you can see, well, which of these variables is most important? Which of them is influencing our population the most? And you get some idea of what the system is most sensitive to. And this is actually quite useful because it could, assuming your model is well-designed and well-parameterized, it could give you some idea of how the natural system works. If you understand that, say, fluctuating survival of young has a bigger impact on population size then maybe you would focus more on making sure young's survival in the wild should be increased. So these are the kind of things you can do to look at that when you have multiple variables. One last question. If you are working with a species that has very limited data, population-wise, how do you account for that? Because I think that's the most situation we found in the conservation. Thanks. Yeah, so we're lucky with the hyenas. We work on a population. We have 25, 26 years of data now, and it's very well-observed. So our model that we're working on at the moment is almost all parameterized on things that we've directly observed. If you don't have this, it becomes a bit more tricky. There's a few ways you could approach it. One thing you could do is use what's called a pattern-oriented modeling approach. So basically you have some way that you expect the simulation to look at the end. Say it should have a certain sex ratio or a certain age ratio. And then you could put in some value of reproduction rate, that you think might be applicable for your species. And then you can check to see whether the simulation in the end recreates how things should be in the wild. So that way you may not have these detailed demographic rates, but you do have some idea that there should be an equal sex ratio, say. And then if you use a particular demographic rate and it doesn't reproduce that characteristic, you know, that that's not a good choice. And you could tweak the variables in that way. But it's definitely more complex when you have less data. So we kind of, our example that we are working on now is like the best case scenario. And it becomes more complex when you're dealing with data species where the data is scarce. 20 years of data is rich for... Yeah, and for a large mammal, it's very good. It's one of the best in the world, basically. So we're... But hopefully we can show, you know, best case scenario when people can tweak from there to work out how to deal with their system. Yeah, cool. I'll let my friends know. Thank you very much. If you have further questions and things to discuss, just go to the channel. And I'll see you around in the other talks.