 Well, thank you all for being still here after a long day. I am, yes I recently started in Bologna, I'm kind of a novel assistant professor there. But most of my research, I mean I just started there, most of my work has been done in England, where I did my PhD, and in Zurich where I did my postdoc. In general, I am a statistician but I consider myself as a very applied statistician in general in biology because I always start from biological problems to then develop statistical methods. And what I've done during my, say, short careers can be basically split into two blocks. The first part, what I did during my PhD is basically to build stochastic models of single cell data in systems biology. And in the second part, I actually turned into a method developer, and I stopped pretty much analyzing data and developed medical methods, basically our packages for bioinformatics data. Now, I gave a bit of an explanation of why I mostly use Bayesian frameworks. So I don't always necessarily do that. But I think that very often, they give you a natural framework dealing with biological data. Now there's no way to question the chat. Okay, I don't see it, I hope it's not about the audio or something like that. It was me telling everyone that they can turn on it. Okay, yes. So I normally go for the Bayesian approach in biological data, because very often you have latent variables, because the measurements are often noisy stochastic, and that introduces a latent variable for the unobserved original population of biological mRNA or proteins. And that can be easily dealt with Bayesian Bayesian statistics, because you just treat latent variables as a parameter, and you just sample them within your framework. Well, in frequent statistics, dealing with latent variables is actually quite challenging, because either you plug a value into those variables, but then you neglect the uncertainty, or you use multiple imputation approaches, but then imputing multiple times those variables is not actually an easy job. Well, in Bayesian statistics, you just impute from the posterior distribution of those variables, and that that is a very natural approach. And the second advantage on biological data is that very, very often you have prior information. So that could come either from other studies, or from other genes. Often in bioinformatics, you analyze thousands of genes together. So when you analyze a single gene, you can still bring information from other genes. And that again can be done in a very elegant and practical way with with an informative prior. So this is why I mostly use a Bayesian framework in when analyzing biological data. So as I said, my research is split into two major areas and so today I'm going to focus on the first part, which is the one in systems biology, which is probably a little bit heavier from a mathematical point of view. And tomorrow I'll look at some of the methods I did in bioinformatics. So let me start with the first project. This is going to be basically like something like a two seminar talk. So in the first project, I analyzed the movements of an R2 protein, which is a transcription factor. It's any place basically an important protective role in the cells, because it regulates the expression of some antioxidant genes, where, if you have excessive oxidative stress, you may have a contribution in several diseases. So the game of the study was to improve the knowledge of the system by studying some some data that basically look like that. We have fluorescent reporter images of the nerve to in a cell where you can see that there is a separation that you can draw between the nucleus and the cytoplasm. And we actually semi manually semi automatically that took a long time place a border on the cell and on the nucleus you see there is already some some noise in the measurement because there are various places where you could put this border. Now once you do that you can compute the average intensity in the nucleus and in the cytoplasm and get to two numbers basically that give you the average light intensity of your energy to proteins in nucleus and cytoplasm. So if you do that, every two minutes across many hours, you get to a by written risk that looks like that, where you can see that you have the nucleus and the cytoplasm on the top, the cytoplasm on the bottom. You can see that there are some also some translocations between the nucleus and cytoplasms and solutions. You can see here, the nuclear and our two drops in the nucleus, while in the cytoplasm it jumps. And the same thing happened there is a drop in the nucleus and increasing the cytoplasm and there again. And you see the opposite behavior, when there is an increase in the nucleus, there is a decrease in the cytoplasm. So obviously the two are correlated basically when an R2 moves from the nucleus to the cytoplasm. It increases one of the two time series while it decreases the other one. However, you see that there is quite some noise in in the in the balance between the two because there is not a perfect correspondence and that is due to several factors. So first of all, the volumes are quite different. And so, when there is a move them from the nucleus to the cytoplasm and vice versa, then the light intensity generates. It's very different because the the that's obviously deludes a lot more in the cytoplasm than it does in the nucleus because it has a higher volume. And secondly, because there are many other things going on, there's not a you don't just have this movement, you also have synthesis of new of new of new proteins, you have degradations in the nucleus degradation in the cytoplasm. And obviously you have noise in the measurements that we make. For instance, these borders change across time, because the cells move and we have to follow them over time. So this is a very noisy process. But within this noisy process, and these, but this noise that comes from both the biology and the measurement, we want to try and model these oscillations, and the biological process underneath to study what is driving them and to infer some useful biological parameters. So to do that, the first thing we have is biological scheme that describes this kind of movements, obviously this was done from with our with from our from our collaborators. So here you have on the top, the cytoplasm, and then at the bottom the nucleus. And you can see that an RF two is normally bounds is normally kept in the cytoplasm in this complex with keep one and PG and five. But when this complexity straps, it can enter the nucleus where it starts transcribing genes, and then it is phosphorylated via a fin, and then it exits the nucleus and then it's bound again to this complex. So you can furthermore see that there is an indirect regulation between the cytoplasmic and the nuclear abundance, because when these complexes is disrupted, and our to enter the nucleus on one hand. But on the other hand, this PGM five allows to phosphatase fin, which then enters the nucleus and phosphorylates an RF two, and allows it to exit. And then there is an indirect regulation between the nuclear and the cytoplasmic and our to which is very complex to model. And obviously, we would like to observe and model all these entire scheme. But that is not possible in practice, because we don't, we don't observe all those, all those players we don't observe keep one PGM five and fin, we observe an RF two. So in that complex scheme, we greatly simplified into a reaction network with five reactions, or an RF two in the cytoplasm can move into the nucleus, exit the nucleus again, and then you have a synthesis that can happen in the cytoplasm and degradation that can happen both in the cytoplasm and in the nucleus. And so this is a simplified scheme that tries to summarize the key features of the more complex one. In this scheme with five reactions, we clearly associate a table with reaction with a reaction network with rates. So we have, first of all, we have this vector here that models, what happens to our reaction, what happens to our system when each one of these reaction appears. When there is a nuclear import, you have an increase of one in the nuclear abundance so you have a vector that explains what changes happen in the nucleus in the first element and in the cytoplasm. So when there is a nuclear import, a molecule moves from the cytoplasm to the nucleus and so there is a plus one in terms of total abundance in the nucleus and a minus one in the cytoplasm. That's when a molecule moves from the nucleus to the cytoplasm. There is a decrease of one molecules in the nucleus and an increase of one in the cytoplasm. And then when you have synthesis, there is an increase of one molecules in the cytoplasm. And with degradation, there is a decrease of one molecules either in the nucleus or in the cytoplasm depending on where this happens. In a reaction network, we associate rates to these events. And so here you see these rates that basically indicate how likely these events, these reactions are to happen. And in particular, we have a linear rate for the nuclear import. So again, from the cytoplasm into the nucleus. We have a more complex nonlinear rate for the opposite movement where here we also have a delay where I'm not going too much into details. This is not so important, but the delays tries to make the model as realistic as possible. And it basically models the fact that when an earth to enters the nucleus, technically, we don't allow it to exit straight away, because it has to undergo a process where it was for related and this process typically takes time. Because this intermediate waiting time until dinner twist is phosphorylated, and then it connects it. The nucleus, and then you have again a constant synthesis and linear degradation rates. You don't actually have to fully understand the mathematics behind it. But what's important is that we have some formulas for these rates. And these formulas are important. Because if you take a very small time interval, what happens is that this is a continuous time or this is a continuous process of a time. And these reactions change all the time, because as x changes, these reactions change. But if you take a very small time interval dt, then these reactions are stable, because the axis, the molecules are stable, and therefore the reactions are stable. And so these reactions approximate the probability that a reaction. Sorry, these hazards approximate the probability that reaction happen in a very small time interval. And so you can basically model this as a birth and death mark of jump process where basically reaction can be approximated by an independent Poisson that depends on these hazards and on the time interval. And again, these only holds for a small time interval where the axes are stable, and therefore the reaction the hazards are also stable. Now, as I said, this is a technically continuous time process, but we only observe it at discrete time points. And so we use the diffusion approximation that basically associates a stochastic delay differential equation to the previous birth and death process, where, again, without going into many details, the, you have a mean and a variance that depends on the previous hazards, where you can see that the mean change in the nuclear abundance is basically given by the hazard for the nuclear input, minor the nuclear expert minus the degradation. And in the set of plasma you have the opposite you have the nuclear expert minus the input, plus the synthesis minus the degradation, and similarly in the matrix in the in the covariance matrix. Now, we use a second approximation that basically gives us a normal likelihood for the changes between two consecutive time points with mean and variance that were defined before. So, although there was quite a lot of formulations I think I hope I can catch up with a few who got lost here, because from this moment we just have a normal likelihood. And what we aim to model is the difference between two consecutive time points. So if you look at these points they're very close to each other. So it's quite reasonable to assume that the that the hazards are quite stable across consecutive points. So we just take the difference between two consecutive time points, we have a bivariate change. And from that, we fit a bivariate normal with mean and variance that depends on the hazards that we have defined. Now, the natural question is, why did we use a stochastic model instead of the deterministic one, because very often in biology, we use the deterministic models. Well, like the deterministic models like ordinary differential equations work very well when you model the average signal in a large population. You have a lot of data, when you average many cells, and you look at the average. Well, the deterministic models tell you, I mean, model quite accurately where the average is going. But on single cell data there was a lot of noise. And so deterministic models are not realistic. And in particular, as we've shown, deterministic models do not oscillate on this system. The deterministic model would basically be a line, because this system only oscillates when there is noise and so we actually need a stochastic one. So I mentioned before that we have to acknowledge not only about the biological noise, but also about the measurement noise. We need to briefly describe how we actually obtain our measurements. So normally you will have a DNA that transcribes into an mRNA that translates into a protein, and this is the object you would like to infer. But that's not what we observe. What we do in practice is to take DNA fragments and engineer them, and then we plug them into the cells. And then the engineer DNA will transcribe into reporter mRNA that will translate into a reporter protein. And then when we stimulate this reporter protein with a laser, we obtain a light intensity. And that's what we measure. So this light intensity is a noisy measurement of this reporter protein. Now the assumption behind this, which is quite reasonable, is that this reporter protein behaves similarly to the original protein we are studying. We don't have the issue of this noisy measurement because in reality in the cell we have a number of, basically we have counts, a number of molecules for the mRNA proteins in the nucleus and in the cytoplasm. But that's not what we observe. We observe a light intensity. We actually have to add a measurement equation that relates the unobserved population of molecules in the cell to the actual observations y. So we use a classical assumption that the observations are proportional to the biological measurements, and then there is a stochastic noise on top of that. Where this proportionality constant kappa is actually different between nucleus and cytoplasm because they have different volumes so we cannot assume the same one. And the same thing stands for the error. We assume a normal error where again the nucleus and cytoplasmic abundances have a different variance. But apart from the mathematical details, the general key point is that we have a continuous biological process, which is only observed every delta in our case two minutes. And again, we would like to observe axis, which are the biological populations of proteins, but we actually observe these noisy measurements y of the axis. And the aim again is to disentangle the biological noise from the measurement one, because we actually want to remove the measurement noise, which is nuisance here. It's something we don't want to infer in order to actually do a proper inference on the biological variability that we're interested in. And ironically, from a Bayesian perspective say the way we deal with this is to use a data augmentation approach. And here is the first, first of the two reasons I mentioned about why Bayesian models are useful, because here we have latent variables. So the axis are letting variables for their unobserved populations of molecules. So we use an MCMC scheme where we sample in an alternative way from the conditional distributions of our parameters, which are basically the biological and the measurement error parameters, given the axis and the observations y. And then we sample the axis themselves. So we sample the biological measurements, given the biological parameters and the observations y. So we basically treat this axis of the population of proteins, we treat it as a parameter. So we simply enlarge our posterior space, and we have a lot of parameters to sample. It doesn't really matter that there are population of molecules. Sorry, the population of proteins. Now, I mean, you feel free to interrupt me anytime because I will just keep going unless I'm stopped. Now this system is valid for a single cell. But in general, we actually observe multiple cells. And so we embed this in a, so every cell gives us basically by very time series. And we bet this in a Bayesian hierarchical framework, where you are not familiar with it. This is exactly the parallel to the frequentist mixed effects model. So we have basically that every y here at the bottom is a different cell that gives us a by very time series. And every, every cell is associated to a vector of, as you see, nine, nine numbers, nine parameters that changes from cell to cell. So we actually model the biological variability between cells. So each cell is allowed to have a different parameter vector. But obviously it would be a waste of information to model every cell independently, because cells behave differently, but also similarly to each other. And so we have a common prior here that allows for sharing of information across cells, so that when we further parameters of every cell, we further them from the data that comes from the cell, but also from the common hyper prior, which is inferred from the other cells. So I was saying that this kind of, you know, some specific parameters and sharing of information. This is obviously not something that's specific to our model. That's a general property of the hierarchical models. And then I mean, I don't know how much obscene of MCMC. So I want, I want really going to details, but just if you kind of know enough to understand, I'll give you a few details. So we have here for every cell, you see two for five biological and four measurement error parameters, we don't update all of them together because that would be very hard to do. So we put them in blocks. I think we do three or four blocks, where each block has parameters that are highly correlated with each other, like synthesis and degradation that very correlated. And then we use a Metropolis sampler to update each block. In particular, we use what's called an adaptive random walk where I don't know if you've seen this but in the Metropolis sampler you need to propose values and then accept or reject them depending on their on their posterior. And so we don't propose them from a diagonal covariance metrics, but basically the covariance metrics of our proposal follows the posterior, so that we propose values, which are more coherent with the shape of the posterior. So that it's more likely that they're accepted. Again, this is not something very fancy. This is a classical very, very classical scheme. I use very basic Bayesian, Bayesian methodologies. After we've done that obviously we need to, we had to validate the method in simulation studies. And so we simulated, we simulated data from what's called the Giletsch algorithm, and we tried to read further parameters and we made sure that the parameters are close, the original ones we simulated from. Here we get to the second part of the advantage of the Bayesian approach in this particular analysis. And that relates to the informative priors. We actually have very informative prior about several parameters. So first of all the degradation. And then another study where they did pretty sophisticated experiments that we didn't have to redo, where they inferred the degradation rate of an RF2. And we use that as a prior for our degradation rates. And secondly, the measurement error. Well, we have, you know, as you've seen, the measurement process is actually quite noisy and we have to define a border, and then, and that is, is not really an automatic procedure. And what we did was to repeat that procedure multiple times from the same data. And the differences between the data that we obtain obviously only refer to the measurement process because the, the biological process is the same because it's the same actual data. We did that multiple times and estimated the measurement error parameters. And then we use these initial estimates as informative prior for these parameters. And that was absolutely crucial, because you have a bivariate time series and you have a combination of nine parameters that you can actually give those time series. And those are very correlated parameters. There are many combinations, basically, of those parameters that are going to give you extreme extremely similar values of the likelihood of the posterior. And it's going to be very hard to identify the right combination. It's absolutely crucial in this kind of data in this kind of analysis to have informative bias that somehow help you to. I mean to put a limit on some parameters and constrain them to a to a in a limited region. Okay, so briefly here are a bit of results about the posterior distribution of 248. Okay, we have 10 parameters. I don't remember anymore why we have 10 that was that was a while ago. That was 2013. You can see you have blue and red lines, they refer to the two experimental conditions when allies, we have 35 cells into one condition and 36 into the other one, which means we have 70 cells. This is very hard to read. Instead, we look at that hyper parameters, which basically basically represent a summary across all the cells at the group level. So we have two groups, that means two hyper mean parameters. So first thing to note is that without going again in details, but we infer useful quantities that are informative for our biologists like synthesis, degradation rates, or the ratio between the nuclear and the set of plastic volumes. And importantly, we show that, for instance, the, the export from the nucleus to the cytoplasm is about three times faster than the import. And when we stimulate the cells, both movements are accelerated so you get enough to basically have to becomes a little bit more hysterical and it moves more rapidly between nucleus and cytoplasm. And finally, we did a quite sophisticated mathematical analysis, which I didn't really do much in this that was pretty much guided from by, by David Rand. And, and here we basically tried to investigate how stable the system is around the stationary solution. And again, skipping the mathematical details, what we found is that the system behaves most of the times in most of the cells, like a noise induces a layer, which is what I mentioned briefly before. That means that the system does not oscillate in a deterministic context. So there is no noise, the system is flat, it converges to an equilibrium. But when you reduce the population of molecules and you introduce noise into the system, so in a limited finite population, then the system very often oscillates. And then as a particular biological feature. And also we notice that this behavior is a lot more frequent. Well, a lot. It's more frequent when the cells are stimulated, roughly two thirds of the cells, compared to when the cells are not stimulated and roughly one cell out of two. And this again is reasonable because when you stimulate cells, you have more noise. And so they tend to behave more like they tend to display more often oscillations. So very briefly, very briefly wrapping up this first part. I think so I'll try to keep it short. We've seen we wanted to study this, this system, because it plays an important role in protecting cells. And to do that, we study these oscillations between nucleus and cytoplasm with developed framework that we apply on data, and really the key fit the key findings we have is that the biological parameters are we fair are useful percent because they tell us more or less what are the rates of the input, the export, the degradation and so on. And importantly, the main finding we have is that we see that this system behaves and only displays oscillations in that the stochastic noise and not in a deterministic one. Unfortunately, we never published that one. I really hope I will be able to write it up in the in the next year. I'll give you a minute to to recollect your thoughts and I'll get into the second part of the of the talk, which is about a completely different project. So what we have to do with the previous project so it really is a block of two seminars in a way that unfortunately not connected in a continuum. But I think this is a this is a simple project to follow this less mathematics here. So we're still talking about flow cytometry data here, it's still single cell data stochastic models and systems biology. But here we try to model transcription, and we have a two state model for transcription. So the interesting part here is that this model is a time series again because it's the process that evolves over time, but we don't observe it in across time, we only observe it as a discrete time point. So it's actually challenging mathematically, because we try to model time series data while observing it only at a single time point. I mean, let me know if there is a question for now or if I should go ahead. It's a question from the previous part model. Yeah, I mean, I think it's fine. Could you please comment on the format of prior distribution used for the first in the first study. Here's again. The prior distribution. Yes, comment on what. If you can comment on the form. Before like the actual formulation. I think the question is about the shape. Yes. I mean, shape meaning the actual formula for the priors. A son you want to. Yeah, maybe a son can speak up. Yeah, he said yes. Okay, shape. So it's quite simple because you know all our parameters here. If I'm not wrong again that that was like 10 years ago, but all are positive. Okay. None of these rates is negative. So it's very natural to work with log. So we actually work with the log of these parameters. So the log is strictly positive. Sorry, sorry, the log sorry of a positive cost that this goes. Spans into our. And so on the log, it's a very natural choice to have a normal prior on the log. So we use log parameters with a normal prior. Normally for most of these parameters, we use vaguely formative prior, which, which basically means we try not to have, you know, to have as little information as possible. But on these four parameters here. We actually use still a log normal prior on the log of these parameters with mean and standard division, which were identified from these studies. So basically, in this case here, for instance, on the measurement error, we have repeated measurements. So we can actually for the mean and the standard deviation of log sigma for instance, okay. And that really gives us mean and standard deviation for log Sigma. And that is exactly what defines our prior now log Sigma will have a prior with mean and standard division which were identified from these analysis. So the definition here rate, the degradation rate here was a little bit more arbitrary, because we just had a point estimate right we didn't have the standard deviation. So we choose again for the log of the degradation, we took the mean on the log scale, and then we for standard deviation I think we choose something that was kind of reasonable but it was a bit arbitrary. I answered your question, and I didn't go off topic. Yes, he answered yes thank you very much. He's in the training that's why he cannot talk. So maybe just a minute, any other questions on this part. Yeah, Isabelle. I have a question concerning the noise that you described by at the end. The disnoices related to the experimental variability or biological. No, no biological, I mean, no, no biological variability. So the thing is that if you have a large population there is little noise, and the system does not oscillate it's flat. If you have a limited population, then there is biological noise, and the system often oscillates. Would have taken more cells. This noise would have been reduced or. Well, I mean if you this is still the single cell level, we're still talking about single cells. So if you have a single cell with a huge population of molecules, then the the noise is decreased and it's less likely that it oscillates. Okay, as the population goes to infinity, the oscillations completely disappear. The more stochasticity there is, the more likely it is to oscillate. And when you average many cells, you completely lose this behavior, because this behavior is in single cells. Thank you. This behavior is in single cells. When you average many, you have, you have this behavior which is averaged out and then it's, it's, it's certainly a constant. Okay, thank you. Andrea has a hand. Yeah, yeah, I'll try. Maybe I'll try to make sense. I hope so. So in case, for example, you have another treatment, and then you will have noise for two treatments. So then you will have a hierarchical that can be also like used, but how would you feel like a prior in that case you will have a mean for each one, or how will you work with. What, what do you mean, if you can we try again to. So, for example, they are there in that area. They are you. So, they are you have for example, one experiment so you can summarize everything they are just one expert experimental sampling, in case that you will add another sampling group, I don't know, whatever. And that the cells are affected by another by other kinds of cancer, I don't know, whatever, you know, you can still use this hierarchical model, right. So yes, I forgot to clarify that is hierarchy is within the same experimental condition so here we have 35 buzzers and 36 stimulated cells. So this hierarchy is for the buzzer cells that are analyzed separately from the stimulated ones. So the 35 buzzer cells have a hierarchy, and we do your analysis. And then the 36 stimulated cells have different hierarchy and different analysis. Okay, I forgot to clarify that it's within each expert each experimental condition is analyzed separately. And because then the mean and the standard deviation for each group I assume is different. Yes, and then that's exactly what you try to study right. So if you like, this is basically what you have here. This is the mean that this is the hyper mean. So the group level mean for this basal in blue and stimulated in red condition. So they are different. And that's actually what you aim to study you also I mean among other things you also want to understand the differences between these two parameters. And then just a really basic question the difference between those two in this posterior. Then it's just a subtraction how will you like fit the difference you know what I mean, I don't mean mathematical difference I mean like change I mean different. Yeah, but then you will say there is a change. How will you say there was a change you know like you have posterior for one group posterior for another grouping hierarchical to hierarchies. Then how do you say there are differences if you're getting posterior for each hierarchy. So you mean like how if you can build a statistical test to make sure that exactly. So you can also build a statistical test we didn't do that. No, okay, it was like, okay, that's what I meant. That was not primary also because I mean to do that you really need to think a lot of information there is a lot of noise also in your estimates because there's a lot of noise in the data, but you do see that there is a clear shift. And so you can take that as a for instance we see no information. Yeah, with the clear chip you say it is clear enough that I mean also this for instance, these parameters indicate how quickly the important export are. You stimulate a condition there faster, which also by the way makes sense because you stimulate cells so you expect a change of that kind and compared to the other parameters, you see that there's not, there's a stronger change in these parameters than the reason the other parameters. Yeah, I got it. And just last thing. How can I say it is really not possible to just have one hierarchy for everything all together, or it's like it's really computational or it's really not possible and statistic wise. Well, it doesn't make sense biologically because when you're basically saying you fit the model you're basically saying that the same parameters are shared across bars and stimulated. That's inaccurate right. It's like having a picture. Yeah. This is samples right. It's, it's reasonable to assume that they follow different parameters and again you're probably interesting in studying you may be interested in studying those differences. Well, if you assume that the same, then, first of all, that's probably not accurate. And secondly, you cannot study the difference anymore between them. You don't really need it because with 35 cells, you have enough sharing of information between them. You know, having 35 or 70. Yeah, you're right. Yeah, and now I got it. Yes, thanks. Simone, I have a question. If you were to compare the two conditions using this posterior distribution, you would compare the distribution themselves or rather the mean or something to get some. You can do various things. We haven't necessarily haven't really done a test. You see, you have various things here you have the kind of the height that the hierarchical densities of the parameters but again it's 70 that 71 densities. But what we did is was basically to look at the hyper mean. So basically the group level mean to the posterior distribution for the hyper mean parameter of, of our nine 10 parameters, and to see how that changes. I mean, I think the main finding was actually not to, to compare the two but actually to then use these estimates, which is basically this linear stability analysis. So here, we basically replace these numbers with our estimates. So we replace them with the estimate that we got from the model, and that brought us to another, yeah, to the oscillation conclusion. Okay, thank you. I also question in from my son in the, in the chat, how these posterior graphs are interpreted, could you explain one of the great of these graphs. Well, I mean you have here the, again, I haven't shown actually but um, you know, you see you have this kind of parameters and then you have a hyper mean, these actually has a meaning for, for every. So each one of these nine against two, four, yes, nine for each one of these nine parameters, you have a hyper mean and a hyper variance of standard deviation. And so you actually have several hyper parameters at the group level that you can plot, we tend to focus on the mean. So this is the posterior density of the hyper mean parameter for. I don't, I think this is, this should be basically this parameter should be in order so parameter KD then K, then BK and so on. So each one of these plots shows a different biological or measurement parameter. And the density is just the posterior that I mean the MCMC you get basically a vector of, you know, very long vector of values. And from that vector you can build the density. That's, that's what you aim to do right you aim to approximate the posterior density with a finite sample. So this is the density from your final sample under the two conditions. I'm not, I'm not sure I replied the question but I think so. Okay. I think so too. I did reply. I don't know if it was off target. Also, how these are interpreted biologically. I'm sorry. What is the biologically interpretation of these plots. Ah, biological. Well, that's a different thing. I mean here you have, I mean, this plots indicate where your, these are estimates of your parameters right that is kind of the next step. So, it's like, I mean, this is a general statistical problem, like when you do an estimate. How do you interpret that well then you go and talk to the biology and you say, look, this parameter I estimate that it, you know, this is my point estimate and this is say a confidence or credible interval around that and talk to them and try to infer. So one of the things as I said one of the things that we found for instance with these parameters is that the expert is three times faster than the input. That's one biological interpretation. Or, again, we use them to get these other biological interpretation about the noise induces later. So that actually goes back to the, to the biologist and talking to them and say these are the numbers I estimated, but regardless of whether they're Bayesian or not. It's probably a little bit harder to interpret Bayesian densities rather than single numbers. But these densities can be summarized in numbers. You can easily point you can easily get your estimate which is saying the posterior mode and a posterior credible interval, which is easier to to interpret. Thank you. The answer is that's very clear. You made my day, little smile. Thank you. Right. Okay, I will anticipate future questions and go ahead. It sounds a bit arrogant actually, but I mean I put the name of the paper here because in case you're interested, you can look at our paper. This one was at least published, unlike the other one. So, again, this second project is about saying transcription. And so we actually examined a few models of transcription and we tried to make a more realistic one. Okay, model number one simple scenario gene here gene state on is constantly active, and it transcribes the marina at great alpha. And then this is the great at great beta. This is again birthing that process meaning that every transcription is a birth, every degradation is a death with exponential waiting times. And each see that, although this is a process that evolves over time, the stationary distribution is a person meaning that if you let it run for long. And you record several values, you will have values that are distributed like a person where the rate of the person is basically the synthesis over the degradation. So the model is is typically not realistic. Because, because that is normally at least fluorescent that is normally really over dispersed compared to the person. So the variance is much larger than the mean. Well, I'm not sure if you remember but in the person, the mean is the same as the same value as the variance, which is a very strong assumption. The more realistic model that extends this one is model to where you have an active and an inactive state. So the gene alternates between a non state and an off stage. Again with exponential distributed waiting times k one k zero. And when it's off, it doesn't transcribe at all, completely flat. When it's on, it transcribes a marini. So this, this model is a lot more realistic, it models for over dispensation, and importantly it models for transcriptional burst. So transcriptional burst are short moments of time, where a large amount of transcription is transcribed is that the genius, typically the gene will be mostly inactive and rarely turns on and transcribes a lot. That's, that's what typically happens. And if we look again at the stationary distribution so we let it, we let it run for long, and we take several values. It's going to follow a beta person a person via distribution, where basically your transcription is distributed like a person with great alpha, well alpha times P, where P models the on off state of the gene. So MP itself is a beta. So it's basically alpha times of the gene. And P is the set of the gene and is distributed like a beta. So this one obviously you see as moon noise because we've put here something which is not a constant, but a parameter of the person is itself a random variable. Now one thing to note is that these parameters can the alpha k one k zero and so on, they cannot be estimated directly, but they have to be reskilled compared to the degradation. Simply because this problem is not. You can start out not all parameters are simultaneously identifiable. But the same happened in the simpler model you see you cannot identify the ratio of the parameters. It's still useful because you can still identify parameters. It's just that the unit of the parameters is not degradation rates. Now we actually so this is the model I just described we actually further extend this by adding one arrow here. So we, we use again this two state model. Now we allow for transcription in the off state as well, because we think that it's not realistic to assume that when the gene is off, it's completely dormant, and it describes nothing. We assume for some transcription in the off state where obviously this is going to be lower than in the on state and typically much lower. Now, the stationary distribution of this model again, so you let it run for long. That has already been derived. But in our work we showed that again, you can write this as a compound of a Poisson and beta. And again, it's quite intuitive because the Poisson. So the mRNA is a Poisson distribution where you have a back kind of a base level of transcription, which is what you have in the off state. But only when the gene is on so when P is one, you also have an additional value, which is the difference of the on transcription and the off. So basically when the gene is off, you have alpha zero when the gene is on your alpha one, because that's what, what you are here. And again, the gene on off state of the genius distributed like a beta. And again all parameters are respected with respect to the degradation. This distribution here, as I said it was already available, but the density of X is actually very unstable, and it's very hard computationally to compute. So with this structure here, we simplified our inference as I will explain later, because we actually never compute the density of X. And instead we sample from it, because it's very hard to compute the density but it's very easy to sample from this distribution. Okay, so here is just a simulation to show how transcriptional burst typically behave. This is pretty long here is zoomed in an area. So you see in red, you should see kind of when the gene turns on, and when it's off. So when it turns on, it only turns on in the simulation for short time intervals. And when it does you see this the mRNA, there is a large transcription and so the mRNA jumps, and it then degrades. There's a burst with a high transcription, and then it degrades again. So this is how the mRNA behaves with our model. And the stationary distribution of this kind of model, so if you were to look at, basically horizontally, basically the density here, it looks like that. So these, which is significantly over dispersed compared to the plus. So again, like in the other case we have biological noise that I roughly described, but then we also have a source of noise that comes from the measurement. In this case the process is quite different, because here we don't engineer the DNA. We actually have the original DNA that transcribed an mRNA, and then we put a fluorescent tag on these original mRNA. And again, we do a laser stimulation, and we record a light intensity. And again, the measurement of the light intensity is noisy because we have a population of mRNA molecules, but we only actually then observe an intensity of light. So once again, we use a measurement equation with a proportionality constant. So this case is univariate, so there's only one kappa, and then there is again a random noise on top of that. So there is a stochastic error. In this case, note that there is a mean parameter here, because we actually detect background noise. So we actually have a positive error with a positive mean, which is basically detecting just background noise. And so again, we get to our parameter vector, which is given by the four biological and three measurement parameters. And again, we try to separate the two sources of variability. Now, in this case, we proceed in a different way compared to what we've seen before. We still have a latent state for the mRNA population of molecules, but before we've done a data augmentation approach where we sample them. Here we try to integrate them out, and I will explain later why we choose a different approach. So our data is this y here. So to do our infants, we would like to have the marginal likelihood of the data. But as I said, that depends on the, that is basically an integral with respect to the population of molecules X. So this integral is obviously not tractable. But instead, as I said, instead of computing the X, we sample from it. And so if, since we can easily sample from this distribution of X, if we sample a large amount of particle, a large amount of values from from X that we call Z, we can basically approximate this integral here. But with this function here. So this is an unbiased estimate of this, of this marginal density there. Now, in the MCMC algorithm in a Metropolis sampler, you normally have to propose a value and then accept reject depending on the posterior, but the posterior depends on the likelihood here, which we're not computing. So here we use what's called a pseudo marginal approach, where basically the posterior here is replaced by an unbiased estimate, which is this one here. Okay, so we don't use the posterior of why but we use an estimate of its posterior. So an estimate of its marginal which which is then used for the posterior. And that actually, since the estimate is unbiased that still converges to the right to the right posterior. Now, as I said in the other case we use a different approach we use a data augmentation approach. There is a reason why we haven't done this year. Here, for every, for every sample, we have multiple we have seven parameters, but we have multiple observations we have about 1000 observations. So using a data augmentation approach would have met having roughly 1000 values of X and 1000 values of P that are very correlated by the way. So instead of having seven parameters, we would have had 2007 parameters to explore, which would have been really, really hard. And also, these parameters are very correlated. So it would have been harder to explore the posterior space with this pseudo marginal approach. We have to integrate out the leading variables. And so we have a much smaller posterior place to explore, and that simplifies our influence so we get better mixing and convergence of the posterior chains. Again, this is valid for a single replicate. In this case, in the other case we've seen in the nerve to one cell was giving us multiple, multiple measurements right here. Each cell gives us only one measurement. So it's a different context, but we still have a hierarchical model. Before we had the hierarchy on the cell. Here we have the hierarchy on the replicates, we have four replicates and each replicate gives us 1000 measurements. So here the why here is a vector of length 1000 made of 1000 cells. So this, this replicate has a parameter vector of length seven. And again, there, each replicate as a different parameter vector, but again there is sharing of information so we also use the information from the other three replicates. So it's the same framework but applied in a different context here in the hierarchies on the on the replicate and not on the cell. I'm not sure this is more confusing or useful but I still still explain this. So this picture of the graphical model, I don't know if you're familiar with this, but it should in theory simplify the intuition. Although it may be complicated now. So say you have your hyper parameters here delta, sorry, theta. So your hyper parameter generates a bunch of hierarchical parameters in our case, again for. We have the biological hierarchical parameters. Each one of these generates a bunch of real biological observations which we don't see, but these are real biological populations of a marinade. So this will be a vector of length 1000 for each replicate. Now you also have your hierarchical measurement error parameters. And then you have here your observations where the observations depends on the real biological measurements, as well as on the measurement error parameters. So basically what we have is that we observe this wise, and we try to integrate out the access in order to get all this bunch of hierarchical and importantly hyper parameters. If you got lost, you can catch up now. Again, second advantage I mentioned about, I mean, at least for me about second major advantage of Bayesian statistics informative priors. Once again, we have a lot of parameters here this is a very challenging task because we have a univariate density. And then we try to estimate seven parameters for me. That's basically an impossible task there is that is almost not. Not identifiable, because there are many combinations of those seven parameters that give us a very similar value of the density. And it's very important to use an informative prior so that we can at least restrict some of the parameters. And what we do here is again, we did some, we had information for every replicate about the background noise. So we had not only the measurements but also background data. And so we use that background data to get good estimates about the mean and standard deviation of the measurement error. And we pretty we use these estimates to have very strong informative priors, so that we're basically left with roughly five parameters with estimate which is which is still accomplished task. And then we use a similar strategies before we use an algorithm which is called a strategy which called metropolis within gifts, which simply means that you do inferencing blocks because you have many parameters and so you do them in blocks. And again, each block we use that adaptive run the work strategy where we try to propose values following the correlation structure of the parameters. And again, we did simulation studies, we make sure that we can recover roughly the parameters we simulated from. And finally, we did we we feed the model to real data. So here we have, we have fish data so fluorescence in CITRA realization data of a version of the HIV gene. We have two levels of simulation so again, we have two conditions for biological replicates each replicate as 1000 observations. So here's basically the data summarized. This is probably easier to follow for black and for red dotted lines. Yes, the red dotted lines are higher stimulation and the black lower level of stimulation. So we further model parameters and maybe here I can kind of answer the question you asked before because here we focus a little bit about the interpretation of the parameters we infer. So these parameters that we have inferred are useful because they give us information. So you we have actually here the hierarchical densities and not the hyper densities because there's only four of them so I can actually afford to plot eight lines in total well before it was pretty much impossible to see anything. So a few things to note, if you compare the red and the black lines, you see that they're quite similar here, and these are the transcription rates, and that they differ a lot here, and these are the rates to turn the gene on and off. And in particular, the gene turns on much more frequently twice as frequently in the industry in the higher stimulated condition compared to the lower stimulated condition, meaning that basically the stimulation does not affect significantly the transcription parameter, but it rather affects significantly the frequency with which the gene turns on and off. And secondly, we looked at, we did some reparameterizations basically we take the original parameters, and we model we play with them to get to get to estimate quantities which are interesting for us. So this one here is the ratio between the transcription in the off and in the on states, and you see that it's quite low, it's below between zero and 4%, meaning that the transcription in the off state is significantly smaller than it is in the on states. And then here on the right you see how often the gene is active, and you can see that it's mostly not if it's only active between three and 14% of the time, so 510%. So most of the time the gene is off. So again, this is consistent with what we knew so that the gene has turns on rarely, and has interns burst, because it's mostly not if then it really turns on and it transcribes a lot. But the other piece of information that we can extract here is that, although transcription in the off state is low compared to the on state, the gene is mostly not it. So overall, the off state contributes a significant amount of transcription, and you see here it varies quite a lot but it's about 1020% on average. So on average 1020% of the transcription actually comes from the off state. So in conclusion, I think that extra arrow in the model was actually helpful and it improved the model realism because it allows to account for this 1020% of transcription, which would have been otherwise associated to the on state. Lastly, yeah, 625. We can also infer the average population of the original molecules of mRNA, which in this case is varies between a few do a few times to a few hundreds of molecules. Now, this is quite a big interval. And the reason why we have such a big interval is because this is actually a time evolving process and we only measure it as a single time point. So we can do our inference. But remember, the results always have the uncertainty in the original data. So in this case, there's not a lot of information in the original data, and that is reflected in the width of the of the interval. And very lastly, you see here the ratio between the variance and the mean. So if you remember, I said before, in the Poisson mean and variance are the same. So this ratio should be one here we estimate this ratio to be again between 10s and hundreds of times. And so there is a huge degree of over dispersion. And it greatly justifies using an over just burst model like the, the stool states on off model conclusions on skip this part because it's almost 630. And, I mean, if you're interested, we have the paper here. Unfortunately, I will conclude with a with a picture that I really like about the David. And if you haven't been to Florence, it's really good place where to, to spend a few days. So happy to take a few questions about the second part.