 It's my pleasure to introduce the last speaker of the day, Christa Fischer, who's also a PI in our network. Christa is a professor at the University of Tartu and a member of the Institute of Genomics. There, she's an accomplished mathematician. She did her PhD in 1999 in mathematical statistics in Tartu, then was a postdoc in Ghent, and then also associate professor in biostatistics in Tartu and an investigator scientist at the MRSC biostatistics unit in Cambridge. And now she's associate professor of biostatistics at the Institute of Genomics and also professor of mathematical statistics at the University of Tartu. She was elected to become a member of the Estonian Academy of Sciences recently, in fact, after the start of this ITN. So we also congratulate you on that. And she is a member of the Estonian Corona Task Force, as we would call it in Switzerland. It's the Anti-COVID-19 Research Council advising the Estonian government. So Christa always gives very insightful talks at the interface of statistics and life sciences. I also remember that she once gave a very funny talk where she compared machine learning and statistics and what the differences are. And the key statement I remember from that was that in machine learning grants are just one order of magnitude larger than in statistics. And that's one of the main differences. Now, all jokes aside, I'm looking forward to her talk on estimating biological age. Welcome, Christa, we are looking forward to your talk. OK, thank you very much for the nice introduction. Indeed, as Carsten said, I have two positions. Actually, my main role at the University of Tartu is to be professor of mathematical statistics at the Institute of Mathematics and Statistics. But also I have an affiliation at the Institute of Genomics. So my research is very much connected to the Estonian Biobank. So yes, today also my examples are all from the Estonian Biobank cohort. And just I have given a similar talk. So part of the talks are shared from a conference presentation a year ago. So that's why also the survival status and the end date here like September 2020. But it doesn't change the message. So the Estonian Biobank cohort is now a really huge biobank because given that the Estonian population is 1.3 million. So having a biobank with about 200,000 participants is something we're really proud of. It's like almost 1.6 or even you can say that 1.5 of the Estonian adult population are recruited in the biobank as a minimum recruitment age is 18. So you have to be adult to give informed consent to join. Now the recruitment has stopped. But there have been like three waves of recruitment. So first initial wave were recruited 10,000 participants then after a couple of years. So the first wave was mainly done in 2003, 2004. Then after a couple of years, the second wave was conducted between mainly 2008 and 2010, where extra 41,000 participants were recruited as the first aim was to recruit total 50,000 participants. And then there was a third wave also quite recently, starting from 2018 to 2019, where 150,000 new participants were recruited. So indeed, we now have a cohort of more than 200,000. But today I'm mainly concentrating on examples with the first 50,000. As you see, we have already quite a long follow-up. So someone who joined in 2003 is now 18 years older. And also, as you might expect, some people are not any more alive as there was no upper age limits. So of course, if someone joins a biobank at 2003, being like 90 years old, since this person is likely not to be alive. And also several younger people. So the picture looks like you see here as indeed the older the individual was at recruitment. The more likely the person is not any more around them. And so we have now I think about, yes, close to even 6,000. Actually, yes, this is the most recent linkage information, close to 6,000 individuals who have died. How do we know that? Because a cohort is regularly linked to the Estonian causes of death registry. As all participants have given their consent that we can link our data with all different health related registers of Estonia. So yes, how to analyze this data? Of course, this mortality data is not a very happy data set. But that's life also. That's every life ends at some point. And the ultimate aim of medicine and developments. And also basically what we do in our network is to provide something that helps people to have a longer and healthier life. So the length to study the length of life of the individuals in large cohorts is extremely important to understand what does the main factors influencing it. So but to analyze such data, sorry for the type of survival analysis methods I used in there. When we had the first kickoff workshop in Basel a couple of years ago, I gave two lectures on survival analysis. But I expect that people who have needed these methods probably remember it. But the ones who have not needed it in their work might have forgotten some. And maybe all of you were not sour. So I just give you a brief overview of the most important concepts. So survival analysis is a methodology refers to the general methodology to analyze time to event data. So we can have a, so actually this event, final event, does not have to be death. Same methods I used to analyze, for instance, data on incident diseases. So if the data, the final event is actually date of diagnosis of a disease. It can be anything else. So I have seen interesting studies on, say, time to pregnancy of people who are trying to get the baby and maybe have some complications or time to, yeah. But also not even in humans. But I've seen examples from insurance area. So time to ending a contract with a company and so on. So it's actually quite generalizable to many fields. So basically to do some analysis, you have to think of time and time scales. So you are interested in the length of a time interval. It can be time from birth to death. It can be something else. It can be time from biopank recruitment to death or to a diagnosis. So what the problem is, the main problem with survival analysis, why we actually need all this methodology is that there is censoring. For many individuals in the study cohort, the survival time is not yet known just because these people are alive. So when we linked the data, the cohort in 2019 with causes of death registry, we knew the survival of participants up to here 2019. Those are ones who had died before that. We know for these people that how long did they live? What was the date of death? But for the others who died afterwards, yeah, we were not able to know it in 2019. Now we have, in theory, two years more data. But still many participants are alive, luckily. And probably none of us can survive long enough to be sure that now you have measured the survival times of all biopank participants because, like myself, several biopank participants, many of them are younger than me. So those methods have to handle this feature, so-called censoring, that we know. What do we know about these individuals? We know when they were born, we know how old are they now or were, say, last time we linked the data sets. So we might know for some more that this individual was last time we linked the data was 73 years old, but we don't know what happens afterwards. But we know kind of lower limit or lower bound of the survival time. And that can be used. There is still another problem that is less frequently mentioned. That is causes left truncation. If we think about the biopank cohort, for instance, if we started a bit later, but think of a cohort where recruitment started at year 2000, if someone who was 80 at that time was recruited, then this was not a typical representative of individuals who were born at that time, like in 1920. But it was a representative of individuals who managed to survive up to year 80. So if you now think of someone, so in the biobank, we had someone who was already 80 at recruitment. And now 20 years later, say at 2020, we also see individuals who were 60 at the time of recruitment, but then eventually became 80. So these 80 years old, 20 years later, are maybe not completely comparable to 80 years old, 20 years before. So we have to think about it, because the ones before were just a very selective subset, the ones who managed to live that long. And that is creating a problem called left truncation. Because the individuals actually, the ones who died before the recruitment never started, they were never able to join the biobank. So we actually don't know anything about them. So for the individuals, so we have to think of that. In that sense, it can't be a random sample. And it can be a random sample if we think of conditionally, that given if we are recruiting people now, so we can pick a random sample of 80 years old who are 80 years old now, but we can't pick a random sample of individuals who were born 80 years ago. So now, the way I'm talking about that, there are like two approaches to analyze such data. A standard approach would be to use, to have the time variables that we're actually interested in defined as time since recruitment. So for everyone, time zero is time of joining the biobank. And we just measure time, how long the individual survives, or most of the individuals actually sense also time to some last data linkage. And then many methods always say, kind of move with a window through some time frame, also possible follow up times. And at each time moment, we observe death. We look also at the individuals who are at risk at the same time. So kind of if we want to analyze whether one or another covariate, whether it's a biomarker from an omics panel or something else, a conventional risk factor, like smoking level or something from diet. And if we want to analyze the effects of these risk factors, we have to, the methods kind of always compare the covariates in individuals who die at some time, some time point with individuals who were at risk at that time. So if you systematically see that the individuals who die have say higher, in average higher smoking levels than the ones at risk who still survive, then we can finally derive that smoking is a risk factor, for instance. The methods that consider time since recruitment as a time scale, they can handle censoring quite well because if someone is censored at some point, then this individual goes away from the risk set. So this guy is not anymore at risk when someone after that censoring time is dying, having say, vent of interest. And so I come back to that later, but it doesn't handle left truncations that can sometimes create biases. And that's why another approach that has been proposed is to consider age as a time scale. But also in the analysis, we have to, as you see here, the lines on this graph, they don't start at a zero at birth, but they start at the age where individual joins a biobank. So the starting point of each line is the time age where individual joins a biobank. So it's only, this example shows only individuals older than 75 at recruitment just to see show sufficient number of events or deaths also in the graph. And then we consider, oh, sorry, the individuals at risk, those individuals who were already recruited in the biobank by that age. So if someone dies, say at age 92, this individual has a vent and then the risk set are individuals who have joined the biobank being younger than 92 and who did not die or were not censored before the age of 92. So they were under follow-up at the age of 92. So and that approach this time has been shown to handle left truncation. We also, I think in Basel I also showed some slides on simulations on that to convince that indeed the survival models in this case give on biased estimates. And also we have to, there are some other reasons why it's easier because we can interpret estimated quantities as risks that is depending on age quite naturally and not risk depending on time since recruitment as recruitment does not change risk or for anyone hopefully. And so it's not an important time point in life in that sense. So recruitment is not a meaningful time zero and we use follow-up time. So we get also some survival curves estimated in age scale. As you see this graph here, you see on X axis, there is an X axis is an H axis. Here as we only had adults, so it runs from 18 to more than 100. Y axis is a probability of surviving. So what we estimate is a survival function that is a probability of surviving past given time T. Of course the older we get, the smaller the probability gets. And if we say that at 105, it's more or less, it's very close to zero for most of us. And that's why it's a function has that shape as you see here. And in Estonia, we really see quite the big gap in life that is also seen in biobank data. I hope that can be shortened as in countries with very high, not anyway. Some countries are doing also much worse. So how to estimate survival function? Just let me know if you don't hear me because I saw a message that my internet connection was unstable, but I hope you're here. You've only gone for a second so we could hear and understand the full sentence. Okay. It was not a long gap here. Okay, great, great. So the most popular couple and my estimate is, or most popular estimate of the survival function is a couple and my estimate. Couple and my estimate is also like moving through the span of the time axis that we have here. If it's age, then it is age. All ages that we have in the data set. At each age where we observe a death, we count how many deaths we observe, quite often it's one, sometimes it's more than one. Trisk. This R.I. And we divide so it's a kind of an estimate of instantaneous risk. Now you were really gone for a sentence or so. So that was a gap. Okay, sorry. I tried to repeat again. So at each time point, we count the number of deaths. We count the number of individuals at risk. So as you see, if there are no deaths observed at this time point, you don't add a term to the product because it will be just one. And what you estimate here, basically, you estimate the conditional probability of surviving past that time point because you take the probability of dying at that time point given that you are at risk, you are part of the R.I. And if you take one minus, it's like probability of not dying. So, and that we also saw in Basel. So that is the estimate. This estimate can handle censoring and it can handle left truncation if the time axis is age. Now we can also go more specific. It's don't have to be only men and women. It can be also people having different values of biomarkers. And they come to models also soon just to let you know that different omics layers enable really detailed analysis of pathways leading to mortality. If there are any questions in the meantime, let me know. Is there in the chat window or just asking? I will do, let me check the chat. There's none at the moment. I'll let you know when they are. Okay. So for modeling, we actually think of another important function and that is hazard function. That seems a bit more complex for not for mathematicians but maybe for others. And hazard function is kind of instantaneous risk. It's a conditional probability. Conditional that you live up to that age, say if this small t is say 60. Conditional that you live up to 60. The probability, we ask what is exactly after becoming 60. And so it's a small time interval between t and t plus dt. And we go with that to limit. So it's kind of a function that measures instantaneous risk. If you know the value of hazard function because of this limiting, it does not really have a direct interpretation, but we work quite often with proportional hazards models that assumes that covariates affect hazard. And most famous approach, modeling approach is a Cox proportional hazard model named after David Cox. I think David Cox is born in 1924 and he's still alive and was quite recently giving seminar talks. Quite an amazing person from Oxford, who in 1972 proposed the analysis approach. So the model states that conditionally on covariates, the hazard consists of baseline hazard that can be anything but that doesn't depend on covariates. And the covariates multiply the hazard by something. This something can be larger than one or smaller than one. So they could make the hazard either higher or also lower. So if you look at hazard ratios that corresponds to a unit change in one covariate, it is just e to the power of this parameter beta. That's why the parameter estimates from proportional hazard models are also interpreted as low hazard ratios. And actually mathematical estimation idea is quite nice. The concept of partial likelihoods that comes down to quite a simple or relatively simple functions that has to be maximized. But we're not going into detail here just to notice that that is a main analysis approach and that is implemented in many standard software packages. So using this methodology also some tricks to make the algorithm running in Chivas setting as it didn't, it was quite slow initially. But some modification, we were able to actually run some Chivas analysis. When the first paper we published with colleagues from Edinburgh was published in 19, or 2015 or 2016 and then several follow-up papers were where we discovered some SNPs that were associated with lifespan. But it's interesting to know that the very first Chivas where we had about 270,000 lifespans but not those participants from the UK Biobank but of their parents. So as each participant also told whether their parents were alive and if they were not alive then what age did they die? And as a participant received his or her DNA from parents so the association between participants DNA and the lifespan was also valid to provide information about genetic markers associated with the lifespan. And actually two markers came out and these two markers are quite quite interesting as first of them is actually known to be associated with smoking behaviour with people to get addicted to smoking and another one was associated with just cholesterol level and known cardiovascular risk factors. And based on the later studies we were able to put together a polygenic risk score for lifespan actually and the polygenic risk score consisted of several thousand markers and this actually gave us an opportunity to see that individuals at the first polygenic risk score had clearly lower say higher median survivals and individuals at the tensed deciles. So the difference was not so huge but about three to five years as you see in the median. So also quite an interesting finding. Then I can also show you another study about some azaromics markers and this is about NMR based biomarkers that means nuclear magnetic resonance spectroscopy based biomarkers. Quite interesting data we had at the Zestonian Biobank. And that paper came out actually even earlier 2014 where we found that surprisingly strong effect by some NMR biomarkers on all cause mortality and also cause specific mortality. So now if you put these two things together this polygenic risk score and NMR biomarkers this is now a slide I haven't shown much and this is still unpublished but it's quite interesting that you can see quite a huge difference. If you look at one end for instance men who have high polygenic score and also high score based on the NMR biomarkers and also the men who have wine azar and they have both scores low the difference between SAR expected survival so if you look at the 50% horizontal line at 0.5 where the median lies. So the median is only like about 60 years in one end and more than 80 years or something like 85 on the other end and also the two other curves are quite clearly distinct so you can see that both scores not just one those are the polygenic score plus the NMR based scores they are clearly having an effect and not only for males also for females I would just see that for females it's very interesting if the NMR score is low like in the below median I would actually say then there is not the polygenic score doesn't seem to contribute but if NMR score is high then the polygenic score apparently gives something extra so it's quite an interesting story. Anyway but if we get these kind of plots that is nice to see, that is scientifically interesting but we can also, we also have to think of our biobank participants at least in Estonia we have promised that our individuals or our participants will get some feedback at some point and yeah say so but what can you say? What can you say to an individual knowing that he or she is either on the red or the green or the blue or the yellow line or orange one the way it's not informative because so suppose we take someone at the age of 70 if this individual is actually on the red curve here we see that cases individual actually didn't have very high probability of up to 70 but we have one guy who actually is 70 so we can't tell us that your probability of being alive now is 25% we have to say that we have to say something about expected future so we have to work with conditional probabilities so for instance if we have someone of age 70 then we have to think of conditional probability of living surviving up to 80 given that we already know that this guy is 70 now and that you could derive it from the curve by dividing two values so the current value at the age of 7 is in the denominator and in the numerator the value at the age of 80 but you can't see it so directly so easily from the plot so there is another alternative idea and this idea is to actually use a concept called biological age so what's the idea is that if we know that we have biomarkers, a lot of biomarkers like we had these NMR biomarkers we had polygenic score we can instead of plotting survival curves we can estimate the expected age of the individual given his or her biomarker profile and we call that as a biological age if you have a biological age that is greater so if you're right now say 50 but you get an estimated biological age of 60 then you kind of think that your risk level overall risk level is more comparable to someone of age 60 and not related to so you maybe have to do something about it or if you get informed that your biological age is now 40 and you know that your biological age is actually younger than your real age so you're doing absolutely fine so that could be better in interpretation but if you look so yes you can look some papers at some papers on biological age for instance metabolomic and epigenetic aging is a really popular topic so the idea is that okay yes the idea is quite simple we have the biomarkers we have actual age we fit some model you can do simple regression but if we have a lot of biomarkers I think machine learning could do better and we try to predict get the predictor for age and then we look at the individual predictors as you see you don't get points on the line you see deviation from actual age and this predicted age when that deviation will tell something about the risk well I did something very simple in Visestone and Biobank of course I just used multiple linear regression or stepwise linear regression and tried to use a training set with 1000 individuals and predicted age in the validation set for 9000 but just to illustrate this is not so important that maybe I didn't get the perfect estimator you can think that okay you take the individual whose real age is 40 and you see that for this individual the predicted age is rather close to 60 and you could tell something about risk so the correlation is about 0.7 so does it make sense is it helpful I looked at association of the biological age with some other risk factors and I saw that so for instance if you look at the difference between biological age and real age so females seem to be biologically one year younger makes sense also one year is too little given that we have very quite much stronger age difference between two genders since our expected lifespan Bodema syntax having high Bodema syntax makes you two years older yeah okay makes sense but maybe we should expect more smoking makes you four years older but what is funny having at least secondary education seems to make you four years older that is contrary to what we see in survival analysis if we look at the Cox model and hazard ratio we see that educated people have lower hazard also having past myocardial infarction seems to make individuals younger but if you look at hazard ratios it's two so if you have had one myocardial infarction you are definitely at higher risk and the same for prevalent cancer so yes if we had tried to adjust additionally for real age we see these differences more or less disappearing but that is probably what we shouldn't do but we still would expect that smoking would make a difference so another question is why is some association opposite to the effect on mortality hazards we can think of our cohort actually like the education thing people who have at least the people who don't have secondary education there are many more of them in older generations than in younger generations where secondary education became really accessible for everyone and promoted and actually compulsory kind of so that is and so then if you regress your biomarkers that for some reason are associated with secondary education they also tend to be correlated with age in that way that older people have less education so you kind of see confounding I can show you another graph on that so what is actually going on I think that your age is having effect on biomarkers and since there are some other risk factors like smoking, bad diet and other so bad habits maybe underlying diseases that also have effect on these biomarkers and these biomarkers uniquely tell something about risk level and so if you're if your risk level seems to be more resembling someone of younger or older age than you then you can think that it has to do with this sea modifying that but it is not so easy if you know something about causal inference it's not so easy to see from this picture why it is like that and why it should be like that so you can probably indeed there is no for instance, sea does not have any direct effect on risk it only works via X then you could probably assume it is okay but not in general for instance what we also see in the Estonian Biobank is that if smokers tend to be younger so smoking prevalence has increased in times, in old people you see more individuals who have never smoked since in younger generation but smoking is something for instance NMR biomarker profile is strongly affected by smoking but now if you look at the association between biomarker profile and age so the smoking associated biomarkers say kind of mirror there is not the effect of not only harmful effects of smoking but also the effects that smokers are younger so if you have a biomarker level that is associated with smoking a certain level of this biomarker can reflect either lower smoking level or older age and you can tell the difference but it is complicated to do it like that so you see that it doesn't necessarily tell you about risk level and probably something should be modified and so we have proposed an alternative approach that you actually look at these survival curves not the linear regression and suppose given the Cox model survival curve for you or for a certain individual is a blue one and for the population is a black one and now you see that for that okay given that you are right now 50 you see your probability of surviving past 50 is actually a bit lower than in average in population if you move to the horizontal horizontally if you move to the right you see that your risk level currently is about the same as in the population about five years later as for someone in the someone being 55 in average in the population so you kind of can tell from here sorry from here that your biological age would be 55 given that risk profile so actually now if you see here's a point 50 and you see the point where the dotted line leads and how it comes from that you look at the point on the population survival curve and ask what age does it correspond to then you have to be able to solve it analytically the problem is that this couple of major method is non-parametric method and so you don't get the simple formula from that you can have a kind of non-parametric estimate that as well of course what you can do alternatively you can look at some parametric approaches but let's first look at the formulas this unconditional survival function in the cohort just average and then you have individual survival functions that now comes from your covariates that is measured so you are looking for a time point where the population survival function is the same as your survival function at your current age so these two functions have to be equal and B is something that you actually want to find that is a risk-based biological age it can be solved if you find a nice distribution fitting your data and if we are working with human survival in biobanks that is not the survival of people with specific disease that is just general survival so it is like Gompers distribution surprisingly well actually and Gompers distribution is a simple distribution where hazard function is just an exponential function of time and it has two parameters A and B if A is negative then it is actually a decreasing hazard if A is positive it is an increasing hazard so usually we would assume that the hazard is increasing for humans and B is just a scale multiplier and survival function also has two exponentials but otherwise it is relatively simple so given this parametric form so we know what function it is so if we go back with this equality we can actually solve it so again I am not going into details how algebraically it comes but just if you have an estimated individual survival function you have the population survival function you can estimate now what comes out what came out was first of all you see that correlation between real age and biological age is better but that is because of the estimation tool or estimation mechanism so if population survival curve and individual survival curve are always equals and we would actually have a straight line but then it wouldn't be interesting because biological age would be equal to real age but just having more and more covariates creates these deviations so if we look at the association of the two estimated biological age with known risk factors most of them do make sense they go in the right direction like poidema syntax and smoking increasing biological age and also having past myocardial infarction or prevalent cancer and education decreasing doesn't isn't reflected by that biological age how to get it corrected actually we could if we have a survival model already so if we go back and what we want to do we want to estimate this blue line as well as possible for individuals why don't we put this like sex effect into the model already so if we add other well-known well-known risk factors we could actually weaker correlation between biological age and real age because we see more variation between individuals in that way and here now if you look at the difference between biological age and real age now you really see that yes females are biologically or according to this gauge they are younger high body mass index increases biological age smoking also increases and also having myocardial infarction or cancer and education kind of takes your six years off so if we now look at the different versions we actually if we look at the difference between biological age and real age and we use the first approach linear regression of course we can do better with machine learning as I said but still you might have the problems given your data if your youngest real age is 20 then 20 year olds can only be biologically older and 80 years old and only be or mainly be biologically younger using that so that is that you can be biologically older than you really are that is actually that decreases with age actually because you are probability of being biologically younger increases with age version two if we only use metabolomics and not as a two factors actually you see that the difference is not associated with real age that is good because indeed at any age you can be either biologically older or younger depending on your biomarker profile and you can also see that the hazard that you can't be really more than much more than quite often than 10 years younger than your real age version three gives most of variability you can be biologically quite often more than 10 years younger and also more than 10 years older so and to summarize I would say that the causal interpretation of the conventional biological age that does remain unclear quite often you have to think carefully how is it obtained what does it reflect and probably it is also good to play around with but whether the feedback based on that can be taken seriously it's another question we have proposed an alternative that has a more direct interpretation and for research I don't know whether we need it if we go back what we did we fitted a regression we fitted a Cox proportional hazards model having the biomarker profile and also all other risk factors as covariates we can study what does a hazard ratio corresponding to each of these factors and that might be sufficient for research but if we want to tell individuals about the risk level this biological age might be more intuitive so if you say that smoking increases your mortality hazard two by two times it is an yeah it doesn't tell you much probably maybe for scientists who have worked with survival analysis a lot and the proportion of hazards model it is interesting to know but not for general people but if you tell that smoking makes you biologically seven years older it puts seven years on you that is probably something that the individuals can much more clearly interpret so and here actually I want to finish with my main part of the lecture I'm happy to answer any question I'm sorry I hadn't been lecturing for some time and my voice was breaking in the middle of the talk but now I'm quite fine answering any of your questions thank you very much very exciting talk Krista we send a round of applause to you here now we have time for questions Giovanni has a question thank you very very interesting talk I have a very probably basic question what are the kinds of applications that you would see for these type of risk score say you were able to develop a very good risk score that is highly predictive would you aim to use it for something like interventions notifying people that are flagged as high risk so that they can fix their lifestyle or some other application yeah one of center applications that has been or kind of also research direction is to use Mendelian randomization so if you find like for polygenic risk scores if you find genetic markers that predict lifespan then you can also look at the genetic correlations of lifespan between with other things and that you can derive so what is actually how much of the effect comes from say smoking how much comes from your liability to get the heart disease how much it comes from liability to get cancer as yeah and that was actually done in all one of these papers by Paul Timers and others published in E-Life but another so once you have all this information also the feedback I think people there are also some apps developed and so on to give like feedback about biological age to individuals so maybe just improving the methodology makes these estimates more precise more accurate so indeed that individuals who actually have higher hazards they get that feedback they know so knowing that your biological age is older than your real age and seeing reasons for that one reason that changes that is smoking and other is that probably your diet is not so good as or maybe that is something like your body mass index is too high that would also be maybe helpful in interventions or maybe also it's nice to know that people who have done anything everything as well as possible and get confirmations that say a biologically younger than they really are that is also nice to know so it's but yeah also interventions could be also considered I think thank you to Vania and Krista Best now has a question please Hi, thank you for your talk I have a question about the Kaplan-Meyer curve because I recently started reading about it and I was wondering if it's possible to spot the presence of sensor data by just looking at the curve the shape of the curve that is an interesting question I think by looking at the curve it is not so easy to tell but sometimes the curve if the data sets are small sensorings have been marked on the curve with a kind of line you can probably say default by R at some point it was but in general I would say it's not an easy way to tell if you look at the end of the curve if everyone had tied in the curve the curve will end at zero because then you know when the last one had tied if not then the curves do not all end at zero as you see also here so that is one thing but the curves can end at zero and it can be censoring depending on whether the last observation was actually a death or censoring thank you Wessna I have a question for you it's a bit like the elephant in the room for me after the introduction in which I said that you are part of the Estonian corona task force can you apply this thinking because there are these people who say all the ones who died from corona would have died anyway so can you give the number of years that the corona infection makes you older for example yes I think that even has been done I can't tell you the number I know that the hazard ratio of roughly two can be applied very much on what time frame are you looking at so in general like probability of dying within a year at any age group given that you get infected by corona is two times higher since the hazard of not dying within a year the hazard ratio is roughly two but how long does it last if you look at the first months after getting infected like I looked at certain age group like men older than 80 was a probability of dying after getting infected was really high was something like 25% or say 20% based on Estonian data so if a man older than 80 gets infected so one in five will die and that happens within a month it is not so but one in five 80 or so 80 plus men will die within a month in population in general so that is much much higher risk ratio so it's indeed a question what is the best way to present it as covid is something that creates very high risk immediately but the long term hazards still need to be studied actually this study is ongoing and also our first results show that for older individuals high increased hazard can stay for quite long thank you thank you make sense are there further questions so I should mention that Krista kindly agreed also to meet our ESRs for a half an hour discussion round so I invite all ESRs to join this discussion round I think Krista again she also jumped in here for an external speaker who who got sick and I would like to also acknowledge her efforts in co-organizing this summer school it was originally scheduled to take place in Tartu in Estonia but due to corona we had to move it online unfortunately we all would have liked to see Estonia but still I think we had a very good start into the first day and I hope for further exciting days and I think Krista both for her talk and for her efforts in co-organizing this thank you very much thank you very much Karsten I also wanted to say the guilt is that I wasn't really involved so much involved in organizing it and you Karsten and Katarina did most of the work and I'm really grateful because indeed first of all we were not able to have it in Tartu and then I was really extremely busy with and I'm still extremely busy with all that covid stuff but luckily it's still bothering us quite a lot we hope that vaccines will finally turn things to good direction that's perfectly understandable but when we were discussing speaker invitations and so on so we are grateful for that but I just wanted to say that the reason why I'm not in my office in Tartu I mean actually in Riga telling you from a conference room that I'm busy with is being a member of the Executive Board of the International Biometric Society that is the largest society of biostatisticians we're organizing an international biometric conference in Riga next July in Riga Latvia I think it's a very my favorite conference special of biostatisticians working ISAR in medicine, biology forestry, agriculture in white range and it's first time in history as a conference in this Nordic Baltic region at all next one will be in 2024 in Buenos Aires so if you want to come to this conference while it's in Europe then July 22 is a good time so and okay but now I'm happy to discuss in breakout room we recommend this event and we thank you again and wish you and the students a very good discussion in the breakout room take care everyone we'll see you at the latest tomorrow morning 8 30 with the next speaker thank you very much bye bye