 Hello, I'm Editha Boysko, PhD student and actually an aerobologist, and I'm going to tell about our research on computational modeling of animals' behavior. Sorry, I tried. Better? Yes, yeah. Thank you. Yeah, for technical note the presentation is available via this thing. And I'm going to tell about our research on the cognitive process of reinforcement learning, which basically serves the purpose of adapting to the environment that is uncertain, and in exact about the component of time, so the time influence on the process. For this sake we use the Q-learning arguments for the reinforcement learning branch of machine learning, but only simple one with further modifications, and I must admit we have quite well suited behavior setting for this purpose. So probably I should start with telling what the reinforcement learning is. So it's basically, as I said, serve the purpose of adapting to the environment that is uncertain, and as a ground rule it is made by the certain interactive interaction with the environment of the agent. So basically the situation looks that an agent, for us it's a mouse, performs a decision, then actually chooses between options, and gets the outcome of the decision, and the outcome may be in form of accessing access to the reward or the access being denied. And as a reward we use water with artificial sweetener, with saccharine, so basically the reinforcer is sweet taste. And what's important, after many and many trials animal begins to have certain expectation of each of the options. It's denoted as Q-value, but we'll go to it further. And for the agent, for the mice I will refer to ATA-3, because I found it much more distinctive and easier to use. So most of our experiments are performed in the interrogators, the previous speaker said, which is basically a big plastic box with four chambers, each of them located in different corners of the cage, and we put 14 female mice inside it, set the procedure on and note all the actions animal performs, and the drink, the beverages are stored in the bottles, and for a mouse it means that as to access it she must first visit the corner, then the inner door must get opened, then and only then she can actually access the bottle, and then she can lick the bottom of the bottle. The catch is the door might get opened or not, we have a given chance. So we set it as a probability according to the experimental scheme, which consists of two stages, adaptation and reversals, and for the adaptation stage it's only important to know that the position of bottles with rewarding beverage changes its position from one state to another, so we have the reversals. And for the main test phase the position does not change, but the probability does. So in a state when we have 90% probability and 30, after 30, 38 hours, the state would be reversed to the exact contrary, so from 90 and 30 it would be 30 and 90, and that's why it is termed as reversal learning and with probabilistic outcome because the chances to access the word are set with a given probability. Now, in a point of view of an animal, it means that she basically has an option between she has a choice between one option and the other, because we keep the water kernels outside of the modelling part. And then I will present the Q-learning basic, my most basic algorithm, so it would like that. Emma chooses between options, let's say A and B, and she has given expectation toward this option. This expectation can be set, it's like, how likely would you be to go for this option? So how in scale of 0 to 1, how do you think this option would be beneficial in the future? So if the choice gets rewarded, the estimate should rise, whereas it's not rewarded, it should drop down. And the increase and decrease is noted by, in relation to prediction error, which is basically and intuitively just difference between the real outcome and the expected one. And the outcome is binary, so there is access to reward or there's none. And also we have been here, so for the next iteration, for the next choice the expectation changes. And for this we use, yeah, it's in iteration, so there is one free parameter, which is called the learning rate, which denotes the weight put on the current choice. So the greater the value, the more information is transferred from the current choice is transferred to the next one. So if it's like almost one, it means that only the current choice would matter for the expectation for the next ones. And the second step of it is to tell probability of each option to be taken. So for this sake, we use softmax function and we feed it with the value of expectation of the chosen option and then no chosen option. And as you can see in here, it also has the free parameter, which is called beta, and as the shape of the function changes with the greater value, the shape of the curve gets more shape with greater value of the beta, and it's important because the value actually denotes how, I mean, if the difference between expectations toward one option and another will be small, if it goes sharply, the difference would be spotted. Whereas it's almost horizontal, it would be unnoticeable. And what should I say in here? Real science involves challenge, so that's the problem we're faced with. So for a typical experimental setup, for example, a patient performs a series of crew reaction tasks with restricted reaction time. But in our experiment, the animals are free to make their choices whenever they like. And the experiment also lasts for about a month, without any break. So we can see that the frequency of intervals between choices is way more spread out. We have to pick something around 10 minutes. So basically it means that it takes minutes for an animal to perform a choice. And now I'll tell about our two ideas of how to explain it. And the first one is to explain the influence of time on decision-making, sorry. What doesn't matter? So the first idea is pretty intuitive, because animals just do forget along with time, right? So I thought the expected rewards should also decay with time. And I searched the World Drive Webplay from left to right, and I found some mathematical attempts to describe the decay, but no hard evidence in data. So there are only theoretical considerations in this issue. But in the deep void of the internet, I found this publication, sorry. And it has exactly this function, along with another theoretical consideration. And I found it useful, so I applied. So after calculating the expectation towards its option, we just decayed along with the time that passed between the choices. And the time is distinct from chosen and non-transit option, because the time passing by between decision A and decision B would be different. And again, this has a few parameters of storage. And if the value is high, because it's range from 0 to 1, it would mean that the decay of the expectation would be short. Because if it's small, it means almost linear and really slow. Another explanation would be quite intuitive for anybody in rodency search. So we found in the data that mice tend to repeat the last choice at longer intervals. So if there's really wrong interval between choices, probably a mice doesn't know what to do. So she just chooses the last known option. And we also try to include it into the modeling strategy. So after having a probability with a self-mocked function, we calculate the another coefficient, which is a log of odds ratio for the action to be performed, which is basically a starting point in this. And there's also a free parameter of Utah. And along with the parameter, as you can see, the probability of a given option to perform rises along with time. If the value is high, it means that the rise would be quicker. And the natural step now would be to tell which model best fits the data and that's the exact explanation. And obviously, the absolute estimate is not reachable because it's modeling. But we can surely tell that simulation on totally random choosing is out of the question, whereas models including time components all perform the basic algorithm. And the measure in here is Akaiq, so Akaiq information criteria, which is quite useful for comparison, but also it is suboptimal for our needs because, as you can see, it's the sum of negative logarithm of probability of the chosen option. So if it's the sum, it strongly depends on the number of choices, which is, and you can see in here, quite, there is quite a lot of within-group viability. So the Akaiq value for this, let's call it, if the animal would be like 300, and for this, like his super active sister, would be like ID 100. Yeah, and the second step would be to tell, to assess quality of a model, would be to tell its parameter values, whether they are in reasonable interest and then to tell what they mean. And it's way more useful when comparing between groups, so between experimental conditions, for example, but also I will go briefly over it and only tell that for the storage model, the decay value for the decay model, the storage parameter is quite on its floor. It means that the forgetting would be very slow, and for the Yota parameter, it's the note that there would be a really influence on it. And as you can see, we do not conduct statistics on a whole group because there's a huge within-group viability, so it does represent one animal and the line connects this dot. So we have a measure for fit to the data and also for parameters for each single animal super individually. And what I want you to remember is that we have a quite well suited behavioral test and also we find that the interval between choices influence the decision making and we tell our two ideas of how to explain this influence. First was slow decay of expectation with time, so the decay model and the second one was repeating the last choice coupled with the increase of the probability of giving action to be performed with longer intervals. Because we found that my stand to repeat the last choice at longer intervals and I know this model seems in contrary, but for me they seem much more complementary because I think the ultimate explanation for this would be that the expectation probability just rises at the beginning because of the consolidation process, so the storing of memory then forgets slowly and then when it reaches some indifference point then the animals should probably go for the last known option. And that's it for the presentation. I want to thank people from my institute, Lukas Schumitz, Zafiharda and Janod Rieges Parkit Nahus here. And for golden data please feel welcome to visit my GIPCAP account. There's a direct link to check out our last publication. Preparation is available. So I'd like to thank you for this talk. And please do have questions. So I think that the automation will be a very important process. Like a few weeks ago I've seen a talk by, so I forgot his name, the person from Princeton who looks at the decision making in rats and he basically completely automatized the whole process. So the rats are trained to be in the cages where they are presented with stimuli. He records from hundreds of neurons at the same time and he has 100 cages which are in his lab with hundreds of rats doing this task. And I think he, in this way, he can... Carlos Broder is his name, yes. So in this way he can get an enormous amount of data and of course the new automated tools to analyze this. So I think this is a really important direction in which this will go. Further comments from the speakers? Some of you from the audience? Have you guys thought about performing this behavior experiment in the open source equation box? Because the IntelliCage, I believe, is a fairly expensive system and not many people can afford to buy it. And there has been some open source kind of like types of behavior design where people can get plastic glass and then they can really build things from scratch. And there is like kind of like some design release from, you know, like a few groups. Have you guys thought about using that platform? So then everybody will be complete open source. Well, there is a system called Eco Hub which was developed by my colleague in our institute. And if I remember, it is presented as kind of open hardware and also my colleague develops... my other colleague develops similar library as PyMais to help analyze data from this system. So I guess that may be an example of what you are asking about. More comments? Yeah, I think that if you have... I have it still. It's okay. So I think we have all the information needed for the modeling part it could be performed whenever the experimental setting. We need information about the choices, visits set it probabilistically and about time of it. I wanted to comment actually. The main difficulty is the technical reliability. The integrity cages are not fun to work with and they are a standardized product with 10 years of engineering experience. Any open source solution, if someone wants to assemble it and then run 10 cages in parallel, they really want to see how many failures of equipment per day they have and how they can somehow account for this in the data later on. So it becomes then a problem of being technically able to perform the experiment. The analysis really becomes a secondary issue and I think this is the main limitation and this will not be solved simply by open source unless there is a great focus on one solution which is extremely simple and widely used with open source is never likely. Yeah, hello. Thank you for the talks again. One issue regarding the modeling of your behavioral data, how do you make sure that your models are identifiable? I mean, how do you account that the solution after you fit your model to your empirical data and the parameters you obtain, how do you make sure that these are the unique ones that not some other type of parameter combination would lead the same result in your data? Yeah, so this is a very important question. For the simple models which I presented like the diffusion model you can see the identifiability because of the arguments that they have, the parameters have different effect on the different features but of course to have this identifiability you also have to have sufficient amount of data and you can assess this identifiability through parameter recovery procedure. So what you do, you simulate a model with certain parameters with a certain number of trials and then you estimate the parameters. You repeat this hundred of times and then you get a measure of how far your parameters are from the true values in the ground true on average. And so this is how you kind of can do this. As I mentioned, this becomes more difficult when you have more difficult models with more difficult parameters. So for example, there was a question here like correlation into noise. The more parameters we introduce... I was asking about something else. How do you detect that your noise is uncorrelated? Okay, but the more parameters you introduce the more difficulties to recover them. One interesting thing is that with the strain force modeling parameters you have these two parameters, alpha and beta. And you can also try to use this model recovery to see how well you recover them. But you can improve this recovery by linking the reinforcement learning models with the diffusion model. So in the diffusion model, the way you link them, you assume that the drift rate is equal to the difference between Q values on the two trials. And there's a paper by Samuel McClure which shows that by adding reaction times to the diffusion model and using this combined diffusion model you can actually reduce the errors on the estimate of these alpha and beta parameters.