 So it's my pleasure to introduce the second speaker of the afternoon, Andrea Garner from the Institute for Molecular Medicine in Finland, the Harvard Medical School and Massachusetts General Hospital. Andrea finished his PhD in 2015 in Epidemiology and Biostatistics at the Olinska Institute in Stockholm. And he was a postdoc in Boston at MGH Harvard Medical School and the Broad Institute. And since two years, 2019, he's now a FIM EMBL group leader at the Institute for Molecular Medicine in Finland. This is FIM. And since the same year, 2019, he's also an instructor at Harvard Medical School and MGH. And he's already leading two big initiatives, the COVID-19 host genetic initiative and the Intervene Consortium, which tries to combine electronic health records with genetic information for phenotype prediction. He's the recipient of an ERC starting grant that he's doing, as you can tell from this CV, exciting work at the intersection of epidemiology, genetics and statistics. We are happy to have you here, Andrea, and we are looking forward to your talk. Okay, thank you so much. I'm very glad I'm invited. I'm also, I don't have to speak about COVID-19 because lately I'm speaking a lot about COVID-19, but I'm really excited about speaking something else and specific these projects. The first time actually I'm presenting this work, so bear with me if anything is clear, we are just finalizing the sort of the analysis, but I think it's quite interesting and I thought for a summer school in machine learning, this can be more relevant than just yet another GWAS. So I work in Finland and in Finland, we are collecting a lot of health data as in much of the Nordic country because people tend to trust the state and the government. And back in the 60, we were collecting very few information just about cancer and then over time, you start to see how we collect more and more, starting with the inpatient register, we capture all the hospitalization, the outpatient, we capture daily visit to the hospital and really you have a big jump in 1995 where every single purchase of a drug done in a pharmacy in the country start to be registered and finally, more recently, we start to have also primary care data though we don't capture necessary those that are in the private health care. So an increase in amount of information. And so to leverage all this information and do fun stuff, we decided to start a project which is called in registry. Now to you, we look kind of simple and in reality it took three years of paperwork and a lot of applications and some sort of political skills but finally we managed to put together quite a comprehensive set of registry that are available for the entire Finnish population. In total, just to give you a sense of the share of data size around 4.1 billion medical encounter across several different registry that cancer registry is the oldest one collecting information from the 50s. The population registry capture some of the demographic information, for example, where you've been living if you have married and importantly all the family relationships. So with your spouse, with your mother, father or your siblings and so on this is very important for the project I'm presenting later. Then we have some registry that are for hospitalization some registry with education, job occupation longitudinally starting with the census in the 70s and then collected from the 90s on the early basis and also which kind of university degree you've taken in which subject. And then if you receive social assistance both you and your spouse, a lot of information this is also important for the project I'm presenting a lot of information about your birth information when you're born your weight if it's a true IVF is your mother a smoker and some diagnosis at early age. And finally drug purchases and primary healthcare although as I mentioned not yet the primary healthcare that goes through a private primary healthcare which is a little bit more common than this hospitalization which is mostly to public but still like many people go to private primary healthcare. So this film registry project it's really about leveraging this nationwide health information to explore risk trajectories of diseases and what I call sort of high-triple epidemiological study as well as develop new machine learning methods both for prediction but also for causal inference. And the idea is not to use this data set to test one hypothesis is I think the main restriction we put to ourselves but rather to let sort of the data speak and use methods that can interrogate the data and generate new hypotheses that then can be followed up in more targeted studies. The film registry project include around 5.3 million individuals which is basically everyone alive in Finland which is relatively small country and then all the relatives of these individuals so that's covered around 7.1 million people many of them are dead and all these at the registries that I mentioned before clearly the registry do not have a uniform coverage across time so for some of the individuals they will cover only the last years of their life for example. Here you can see the year distribution of birth for those that are alive so we have some centenaries and then there you see that males count it's lower on older ages because men die younger and so this trend is expected. Although I will not going to touch among the drug purchases but I think it's extremely exciting a data set of drug purchases because really is by far the largest contributor of the data size. This is around 800 million drug purchases since 1995. What you see here is as an example drug purchases for asthma and here you have time, here you have individuals and here you have in blue when you go to the doctor or to the hospital to the primary care for something related to asthma so this specific ICD code and you can appreciate how irregular are these trajectories. Actually how regular they are in a certain sense because you get a new prescription every three months so that means that you try to go and purchase drug every three months but also the trajectory are determined by this package size I mean it feels like the package and so on but there is also a lot of irregularities that you can see around that are not simply explained by the package size or by the prescription pattern but they may be able to do something else so something we are doing and we are interested in doing is to study more the determinants of drug adherence and drug purchases behavior and what is determining these aspects. The other important component which I think a lot of electronic health record especially those hospital based or that are very popular for example in the United States don't have it's a social determinants of health so we have information as I mentioned about education here we have to certain extent income and marriage history and so on and I think these are important when we want to understand the relative contribution of social determinants of health versus health past trajectories as well as genetic and genetic it's another interesting of mine and for a subset of these data we have actually genetic information and we can do some of these analyzing teasing apart which different component contribute to disease risk. The most high consumers and complicated think is when you get these data is how you define diseases. Diseases are the typical sort of encode to try to navigate in these sort of large data sets and there's been a lot of work in other country like in UK with the caliber research in US to a certain extent with these few resources to try to map different codes to disease entities. Now in Finland because of the Finjan project which is the project that contain genetic data but since it's such a large project and it's kind of well-founded we had quite a lot of clinical groups working on defining disease endpoints. So this was very important I think for the entire research ecosystem even if this was generated by only a subset of data with genetic information but here you can see for example how we define hard failure in using all the registry data so we combine ICD-10 code, ICD-9, ICD-8 these are diagnostic code that go back I think the ICD-8 even to the 70s or 80s and then there's been a progression in ICD codes we have certain reimbursement code that these are specifically to the Finnish reimbursement system but then we also have specific drugs that target hard failure and inclusion of drug is very important because you will miss a lot of cases especially if milder cases that maybe they go under the primary healthcare you will miss it but by using drug information you can rescue those. And when you have all these data then naturally you need to define disease outcome both in a wide way so like have you ever had any of these disease and the age of the first diagnosed but also in a longitudinal form and here you can see the IDs it's the same but then you have repeated endpoints so you can draw these trajectories of disease pattern as well as medication pattern and other event pattern over a lifetime and this work has been done by a working group which sort of created this massive script that input all the registry and somehow output this longitudinal files that can be used for further analysis so this is just to give you a sense that it's not enough to just put a lot of data together there is a need to sort of process extensively this data with in mind how they can be combined and put some prior knowledge to obtain better outcome as well as better definitions. Now we have done a portal and I see if I can share that specific one which is called RISDIS currently is just using a subset of the individuals that have genetic data so it's around 300,000 and what you can see here is let's say we search for example for coronary major coronary artery event here we have around 3,000 disease endpoints and here you can see how we define the different endpoints how they are related to broader and narrower endpoints or disease definition and the different step we have used to sort of create the disease endpoints we have distribution what's how much having this endpoint increase your risk of dying in the next 15, five, one year or in the entire follow up as well the cumulative incidents of the disease over time and importantly we have done sort of these survival analysis which is around four million survival analysis where you can basically compare in a longitudinal fashion each disease endpoint versus the other so for example if you take angina that's clearly having a nose of angina increase substantially your risk of developing coronary artery disease and you can see that increases especially if you are close so if you are within a year of follow up on distance between angina and coronary artery disease but increase that also when you sort of look at the longer follow up time and then you see that so because every disease is associated with any other disease so whenever you have sort of a disease diagnosis your risk of having any other disease increase substantially what we've been calculating is how unique is this pair of associations compared to what is expected given all the other association and you see that this sort of pair angina CHD is in the 99 percentile of the distribution so it's a quite specific association and similarly we can calculate which are the drugs that are more likely to be purchased once you have a diagnosis with a major coronary artery disease event where yourself you are your own controls because you can compare the trajectory of drug purchase before your diagnosis of coronary artery disease and the one per after and then we can compare those two trajectory and see which drugs are not reached before and so this is some idea on sort of what I call high throughput epidemiology we make this is all publicly available you can go and explore currently on a subset of 320,000 individual but we now expand in these on the nationwide data so which would be less biased because clearly these individual are a sort of very selected court and this can provide some basis for future epidemiological studies that one can for example follow up on so what I want to spend the rest of the time speaking about it's a project which is led by Oshin Liu which is a postdoc in my team so all credit should go to her I'm just you know reporting some of her work and this use partially a subset of the Finan Registry data set to explore which diseases impact lifetime reproductive success I mean I think it's better to call it outcome it's not necessarily a success but this is sort of the idea we actually use data from both Finland and through our collaboration we also use data from Sweden so that allows sort of replication so what is lifetime reproductive success or outcome sort of traditionally is defined as the total number of children and over an entire reproductive lifespan here you see two person the first person had many children and many grandchildren and so can sort of pass the gene through the generation person two is childless and that sort of doesn't pass the gene through the generations this is actually a real example from Swedish data so why is it important to study reproductive success certainly from an evolutionary standpoint that's measured the rate of increase on individual possessing certain genotype or phenotype so it's a proxy for biological fitness from an epidemiological standpoint you can study when you're interested in reproductive health and from a demographic standpoint that's allowed you to sort of better understand the population growth and population trajectories and to better understand the sort of aging that why it is specifically important to study childless and childless is what we are going to look at we're not going to look at lifetime reproductive outcomes so the total number of children but rather we're going to compare people that never had children with people that ever had one child the reason to do that is because it turns out they're very different processing the term meaning going from zero to one child and going from one, two, three, four, five children and the same sort of diseases as well as environmental factors that determine these reproductive choices are quite different so not having children especially in the past was associated with social stigma from a genetic standpoint it's a proxy for fitness there is an impact on long-term health condition for example it's been shown that having children and having more children reduced the risk of breast cancer and as well as social economic status and then surprisingly little is known on how diseases impact the chance of not having children especially men and the reason why little is known is because for two reasons one is because people has always been focusing on few diseases which have a clear biological impact on childless which is like fertility or other type of female specific reproductive diseases second because to know if a person didn't have any child we need to follow that person for long enough to be sure it doesn't have any child and so we need a very long follow-up and it's quite unusual from many of electronic health record nowadays to have a very long follow-up time but here we can follow up people for 40-50 years so we can see that they are concluded their reproductive period without children so it's important to put sort of a conceptual framework on how diseases can impact childless there are many different ways and we will not be able to tease apart all of them the first one which is I think the first one that comes into mind is because they directly impact the quantity so for example semen and quality or ability to conceive and keep pregnancy the second is that they can impact the chance to find the partners they can impact the willingness to have children they can impact longevity so you don't live long enough to start to find the partners and have children they impact the offspring chances to born alive for example you try to have children but you have a lot of abortion for example and finally they can impact the timing of reproductive behavior or they can share common causes with established respect that also reduce fertility for example alcohol consumption so these are all many ways in which diseases can go and influence childless so what's our data set? we take every man born in sweden and in finland between 56 and 68 and we use this birth court because then these people are 50 years old in 2018 so they have concluded their reproductive period and then we take every woman born between 56 and 73 and again in 2018 they have concluded or at least most of them have concluded their reproductive period so with this court in mind we can basically look across a very long follow-up and be sure that some of these people had children and others they didn't have them the inclusion criteria is that you need to be born in finland sweden you don't have to emigrate you need to have both known parents and you need to be alive until the age of 16 and this clearly it's a strong condition alive until the age of 16 so we for example will not capture those individuals which for example early onset or childhood diseases that determine you for example to die before the age of 16 children need to be not conceived by a sister reproductive technology so we have removed that from the picture they need to be born alive and so that doesn't allow to explore for example natural abortion or other type of miscarriage and then they need to be born before the end of sort of follow-up so in total if this is the index person which are these two numbers here because of this unique multi-generational registry we can build the pedigrees and we can see with the brother who are the spouses and the children and so in total these pedigrees it's around six million individuals and now we will focus on 487,000 same-sex full sibling parents and we'll describe later why we are looking specific at siblings but be a reminder that this is an important aspect of our study we use the approach I described before both within an infill and to capture around 1700 diseases across different disease categories and it was very pleasant and surprising to me in a positive way to see like such a good relationship when looking at the prevalence of diseases in the Swedish and the Finnish population these tell us that sort of at least in Sweden and Finland people or the doctor tend to use ICD codes or these diagnostic codes in a rather consistent manner and you know we don't know how well these expand to other countries in Europe we have some project looking at that in France but this is a good message that eventually we can try to bring together these registry across different European country and really to try to meta-analyze data across Europe and just to give you a sense around 30% of men remain childless and only around 18% of women remain childless this is because some men just have children from multiple women and the rate is similar in Finland and in Sweden okay so now coming to the methods part how can we use our data to answer potentially causal relationship between disease and and childless so the first way sort of more naive way is that we're going to look simply at the association between childless and disease so just looking are people without children having higher prevalence than certain diseases okay so you can just do this sort of association this is interesting and it might be valuable for example for for when studying fitness or sort of this genetically or evolutionary biology related question because even if you know sort of having more children reduce your breast cancer risk then even if the breast cancer is a consequence of children the genes that are associated with breast cancer will tend to be less likely to be passed through generation and so the sort of the causal relationship doesn't doesn't really matter so this makes sense if you think of certain type of analysis but clearly doesn't make sense if you want to try to understand which diseases actually have a real impact on or reduce your chance for to be childless one way is adjusting for covariate or you know propensity match score or you can throw any other type of machine learning approaches that try to for example optimize the covariate selection or optimize the matching of individuals so what you want to do is that you want to sort of simulate a trial and find a digital twin to yourself by matching for example your health history or such economic status and everything and then you you are going to look so between if adjusting for this covariate is sort of removing confoundings that impact the association between disease and children now this is all good but the best way probably to find the best sort of digital twin is actually to use your brother so that when we use a sibling design we have a way to control for confounding without the need to specify covariates or other information so this is a little bit an idea of our sibling design work you have here a family one that has two sibling women and one as a child and the other doesn't family two you does have only it's a single child family so it will not enter in our analysis family three has four siblings three they have a child one does not and so we will match this person with the same sex sibling year that have a child and family four they are both childless and so they will not count in our analysis because we always consider sibling that are discordant in this way you can sort of create a perfect matching where you take your sibling and then you fall your sibling that is closest that is born closest to you and then you follow over time and then one of the two for example will have a child the other will not and we will sort of interrupt the follow up time at the birth of the child of the first individuals and then we will take this follow up time year and compare this is incidence between you and your sibling and so you can do it in different way we use conditional logistic regression but you can use survival analysis for example to in that way sort of you don't need to do sorry you don't need to do this are the cut here but the results are very similar so so this is the idea it's basically to imagine that you're running a trial where your sort of placebo is your easier sibling if we do that these are the results you have so here is the logouts ratio of the risk of being childless so lower than one means that the reduced or sorry increase the chances of being childless above zero increase the chances of having a child so decrease the chances to be childless as you see most of the diseases are before the below the line there are few above and I will describe describe that later but if you look at the disease category the first things you start to see oh and another thing is very important these are longitudinal so the disease needs always to happen before the child or the end of follow up so this is should remove reverse causation what you see here is that having a diagnosis of mental health a lot of different type of mental health that the crazier chance to have a child similar for the endocrinal and nutritional metabolic system and the nervous system and also something for the circulatory system as expected skin and subcutaneous the tissue or musculoskeletal system things do not really have a large impact when we look at men which it's it's it's you know highly unexplored as well you will see that again mental health as a substantial impact on the chance of being childless these are not small effects I think this is um well this is I think reduced by something 60 percent or 70 percent your chances to um to have a child if you have a for example diagnosis schizophrenia um but you see interesting there's something on the digestive system as well um and so on now I don't have time to go into each of these diseases but they are a little interesting story to be told about of that when we compare uh men and women that's some interesting result come out first of all mental health related disorders have a stronger impact on the chance of of um having a child in men and in women this reflect uh something nowhere sort of externalizing behavior uh or certain type of behavior that in men are seen as negative they tend to be seen more positive in women so they increase the chance to find partners and and and then therefore having children interesting myopia for example it's something that decreased the chance much more in in men than in women in women there is something more related to sort of um maybe a biological effect like diabetes which is known uh uh to uh increase the risk of of precancia and other diseases that impact directly um um uh the the chances of of having children or having healthy children as well as as as the myocarditis as well the impact it's disease specific so you will see that not for all diseases the what time you are diagnosed with this disease has the same impact on the chance of having a child over a lifetime for example for obesity if you are diagnosed with obesity on younger ages have a more severe effect on the chances of having children lifetime that being diagnosed with obesity later in life still before having children but you know probably also reflecting the severity of of obesity and this is specifically strong in women in men's less less effect alcohol dependence is the opposite effect it became worse and worse um in the your chances to to to find uh partners and then having children while for other other diseases the effect is um is pretty consistent across age for example schizophrenia if you put everything together so these are all the diseases that are associated with your chance of having a over a lifetime a child you see that for women the lower point which is these red dots is around 20 25 years of age and for men is around 26 30 years of age and this is really the period where you are searching for partners and where these diseases matter most if indeed the average this is this period is just before the average time of average age of first birth which is 26 in women and 29 in men so it's clear that these diseases have something to do to the period you are finding uh partners so um what what's what's the main reason why these diseases impact your your chance to have a child over a lifetime but the main reason is that most of these diseases they impact the chance to find the partners indeed if you look at the correlation between the effect of the disease on having a child and the effect of the disease in finding a spouse you see there is almost a perfect correlation interesting the correlation it is strongest in men than in women really reflecting how women still have some diseases that truly uh don't have to do with the partner selection but rather they go through a process of you know a more biological direct process why for men the impact is mostly through the chances to find a partner so what's happened if we then look only among married individuals so now we exclude the sort of the the possibility that uh you know you you you don't find a partner we take only people that have married and then we look how these diseases impact once you're married a chance to have a child what you see here is that again most of the impact here goes through some of the mental health related disorder and this is this is quite interesting I think it's it's it's unclear why probably you know familiar instability or willingness to have a child and clearly for women uh has more to do with with um with a quantity so you see an impact on on obesity and uh and I think this is you know partially uh related to to to a to miscarriage as well now there was some weird result if you remember when we look at this plot um you see that there are some diseases then increase the chance of having children and so we were really puzzled by that what what this what does that really mean and in in men this uh so the first things are that you can see that there are some things that pop up both in men and in women and actually replicate very clearly both in Finland and in Sweden and those are tonsillitis and appendicitis which are both diseases of the lymphoid system and then for men specific you have the musculoskeletal injuries um and in particularly the the arrangement of me but if you look here on the underlying plot you see that there are a lot of sort of positive association between this musculoskeletal system and number of children now our hypothesis for men is that this is probably due to the fact that men that are engaging more in sport uh and have a more active life are probably higher chance to find a partner and so to have children so in a sense is a failure in correcting for potential confoundings um while the tonsillitis appendicitis remain a little bit more puzzling so to understand if this was some kind of causal relationship for example something related to the immune system or so on we um um ask help to genetic since for a subset of these individuals around 300,000 we have genetic information and we have defined exactly the same diseases and the same phenotype in the same way uh we can uh do both genetic correlation and Mandelian endomization to understand if there is some causal relationship genetic is very powerful because uh through something like Mandelian endomization you have sort of less problem related to reverse causation but you can sort of see if there is a clear biological uh signal that overlap both these diseases and the chance of of you know of having children so your your your fertility so when we look at genetic correlation so are the genetic signal that underlying tonsillitis um or the risk of having tonsillitis overlap the genetic signal for fertility you see that there is a negative genetic correlation so this is consistent with the epidemiological observation where uh uh you know having a higher chance of um of having oh well sorry this is this should be a positive genetic correlation okay so they have a negative genetic correlation so increased risk of tonsillitis the increased risk of the children so it goes in the opposite direction that then what observe on the epidemiological association and then when we do um a mild analysis uh you see that there is actually um no relationship or at least the no relationship that is there's a significance so we believe that uh the relationship that we see between um these diseases and fertility is probably not causal there's something else going on and we don't know exactly what it might be some residual confounding or something like that so um just to summarize this part um mental health physical disability and diabetes have the largest impact on being childless uh there is a sex-specific effect where mental health has a more severe impact than women while diabetes impact more women than men there is a strong age-dependent effect that is observed for some diseases uh for most diseases the strongest effect is seen for diagnosis which really happen around the period when you find the partners and most diseases impact uh the lifetime risk of being child of of um of having uh or not having children uh by uh reducing the chances to find a partner because I think a misspelling not yeah well I guess you can call it being childless not yourself but not having children yes that's correct um and um it's clear that some mental and endocrine diseases can increase child the childless even among married couples so this is an effect that is independent on on the chances to find a partner and there are very few diseases that increase the chance of having a child as we expected uh summer puzzle like removal removal of lymphoid organs and musculoskeletal injuries in men it's unclear why actually this relationship between tonsillitis and and um her period has been reported before in the literature but uh it remained quite puzzling what's going on there um I will just use my last uh four minutes to uh just show you um a slightly different angle to this analysis which is um what's the impact on uh the disease risk that you experience on your sibling so um you share with your sibling genetics so 50 percent of your gene but you also share the environment especially during the pre-reproductive periods so you share socioeconomic status for renting lifestyle and access to healthcare and so we can start to uh look how um having a disease being affected with the disease how the impact uh is not only to to you on your chance to have a child or not but also to your sibling notice that using this design you can also probe evolutionary theory like balancing selection sex dependent selection the novel mutation ancestral neutrality I will not go into detail but maybe the example I will show you in the end might clarify why this this can be interesting so these are the results that we observe what you see here is we take women we look at the affected women so these are all different diseases each dot here is a different disease and they are ranked by their impact on uh child on the uh risk of being childless and so here you see that you know they we're taking only those that reduce child or that is their significant and you see that you know there are some this is probably schizophrenia or something like that that is down near the list but what you can appreciate them if you take unaffected sister so they're not being diagnosed with the disease the only thing they are is just that they are your sister of these affected individuals they also tend to have a lower or for some of diseases they tend to have lower chance of having uh of having children um this is in men you see that actually the relationship so this decrease in the ciblio in the unaffected cibling is a little bit stronger in men than in women it's not sure why probably because ciblings they are maybe more reactive to the familiar environment than than than sister they tend to be more similar uh once exposed to certain familiar environment but you see that overall unaffected cibling of affected individuals have on average more childliness proportionate to the impact of the disease on lifetime childness so stronger and more impactful is on your risk of being childless stronger is also the effect on your cibling that does not have the disease there is some sex-specific effect uh and uh i you might see here there are some weird things going on so there are some diseases that in the affected individual they reduce the chance of having children but then in your unaffected sister they actually increase the chances of having children so what's going on here are two example um schizophrenia is the typical example where if you have schizophrenia you have very low chances of having children both in men than in women a little bit stronger in men than in women as i mentioned before um but the impact is there even in your brother and sister even if they are not being diagnosed with schizophrenia if you think about this from an environmental perspective this can be the share of a familiar environment if you think that on a genetic perspective this is probably both polygenic and the novel signal that is in inherited and so it brings you somehow to the spectrum and you're maybe not diagnosed with schizophrenia but you show some of the trade that reduce your chance of having children but then if you look at alcohol dependence you see that affected individuals that are diagnosed with physical dependence they have substantial increased risk of having children more in men than in women and this has been we showed before but then actually an affected sister of people with alcohol dependence tend to have more children this is what we call if you think it again from a um a sort of genetic standpoint is what we call balance in selection or sex dependent selection so that this the genetic predisposition when once you sort of go above the the tip of being diagnosed with the disease or you know develop a severe form of the disease that impact your life by reducing your chance but a sort of an average dose or an average level of genetic risk for alcohol dependence in women is actually increasing the chance of having children for example by this this is related with restaking and externalizing behavior which are traits that are typically more appreciated in women than in men and so they they increase their reproductive success okay so um here it's that an affected sibling of affected individual above average less children proportionately to the impact of the disease on lifetime number of children also put in in other word uh if if you have a disease that uh decrease your chances of having a child your unaffected sibling is a little bit on higher risk of not having a child as well there is a sex specific effect the influence are larger in in brothers than in sisters and then for some diseases they are sort of called under balancing selection or or or sort of other mechanisms in play but you have a lower chance to have a child but your sister for example have a higher chance to have a child and just to think sort of balance out so um just to draw some conclusion about this work um hydro put exploration of the lifetime reproductive success really allowed for the first time to compare magnitude of effect which is very important and I think why I'm a big fan of this hydro put exploration because when you have paper just focusing on one disease and one relationship with an outcome you cannot understand how important are in the big scheme of think but when you compare all of them you can start to see a really the uh the magnitude of the effect um and then it allowed to identify for example novel hypotheses or diseases that are not traditionally explored in their relationship with reproductive outcome it's quite striking to see how mental health related diseases to potentially increase the risk of being childless uh mostly by reducing the chance of finding partners this is sort of I think overall uh highly underappreciated when we think about you know reproduction and we always think about diseases that have direct impact on reproduction we don't think about diseases that are mediated to for example finding partners we are now calculating how many children sort of are uh prevented in a sense by or or reduction in total number of children attributable to um mental health related issues there are some unexpected findings uh like lymphoid organs and the relationship with fecundity probably is not causal we don't know what's going on and then the disease impact on on childness transmit within family either via genes or familial environment and it's quite interesting to see a balancing effect for few diseases um this is all from my side and this is really uh I I I don't have an unknowledged slide because I uh kind of the first time I'm presenting this but um um this is all work of oshing and so uh this is really all the tanks should should go to her thank you Andrea now are there questions from the audience yeah we are sending around a virtual applause to you as you can see here on zoom um so let me see yes yes I have a question so Bowen goes first in Giovanni then Lucas Bowen please we can hear you but rather low please go close to the mic okay so thanks for the very interesting talk I just have one question so I see that the data you collected are mostly from the Finland and Sweden right so those are the countries with very high average income so do you expect the results to be very different if you apply your methods on like the data collected on those regions with average median income or very low income um yeah I think uh in a sense uh we are capturing an older court of Sweden and Finnish so there might be more similar to lower income uh or or lower income country because this is not the modern so for example type two one diabetes is is associated with having less children um there but with the advancing medical system we almost don't see that anymore but I would think that um you know clearly there are going to be some differences in um lower income country and I think a difference is going to be especially when we look at mental health in some country especially in in in african country for example there is an enormous stigma uh on mental health and so it's almost impossible if you have a certain mental health diseases to find a partner and so maybe some of those effect in country where there's still like large stigma on mental health um we'll have a larger effect but if anything I think probably the effect would be exacerbate thank you Bowen thank you Andrea Giovanni is next thank you for your talk it's pretty interesting my question is um considering that a lot of the data that you have to gather is on a pretty long time scale since you have to wait for people to have children you have to measure whether they don't have children and so on have you had the need or have you thought of controlling for socioeconomic factors and their changes during time because for example many societies tend to and lowering the number of children they have quite naturally or because some of them could be confounders I could see poverty for example being a cause of mental health issues and not having children or possibly some minor factors like uh lepoid removal could be because some people could afford that operation and then they are more affluent and can afford to have children as well so I could see some confounders in those factors have you had the need to correct for them or have you thought of doing so yeah so so I mean I guess that's the biggest challenge like the confounding and how you claim this cause I think you know because we use this sibling design your control is not someone in the population your control is always your sibling so we're only comparing yourself in the sibling which means that we are looking exactly at the time period so we're looking at the time period we're looking at a rather similar image um socioeconomic status or familiar socioeconomic status because these individuals they you know you tend to have children quite young so you have spent most of your life living with your brother and then you move out and then you start to have children but most of that is is kind of controlled you live most of the time in the same geographical area so so the sibling design is the best we can do problem between design will be really the best you can do because then you're born exactly at the same time but this is sort of the the sort of the way we use to get to um sort of avoid problem with confounding when we we do also population based analysis so what you suggest comparing each person with the overall population there we adjust for familiar income we adjust for covariates we adjust for time and other things but you clearly see for you know for this is where you expect to see an effect you clearly see that the sibling design works much better than just you know any type of sort of statistical correction thank you thank you very quickly like a very quick follow-up questions if you adjust for siblings can you see a difference on whether the sibling you used to adjust is older or younger like that's a good question yeah yeah i'm very curious to see because that could be related to shifts in society although they are probably on a small time scale so it shouldn't be different anyway it's a good suggestion it's a right now i think someone suggested to me before also there are a lot of theories about a sibling order and you know if if you're older sibling as schizophrenia versus your longer sibling as schizophrenia than my you know sort of if i have a different impact on you because you look to your older brother more as you know the older brother versus the younger brother so i think that's can be i don't know if we have power but it will be cool to see if it is estimate change by by checking the order yeah that's a good suggestion Lucas yeah hey thank you um thanks for talking it was super interesting actually borrowing a bit from from bow and i was gonna ask sort of the same bit paraphrasing now i was wondering about the transferability of the results but i can imagine that there is a tremendous impact on local culture on many of the findings that you that you report and i was wondering if you thought of interacting with social scientists for example um on the interpretation of the results that you get yeah we're actually two of the team members are sociologists so and demographers so we are interacting with them um i think especially at the findings like you know this kind of balancing selection like women that are more it's actually you know if you do it from a genetic standpoint and you look at polygenic score for example of neuroticism or externalizing behavior you will see that this type of polygenic score they decrease chance of children in men but increasing women and this is clearly societal on you know what people find attractive in women versus men uh so that's that's probably changing and maybe you know that we are going through equality and maybe you can use this type of things also understanding our society evolves but i think you know the fact that sort of mental health reduce um the chance of finding a partner uh i think is still is still quite true um things like myopia that we see for example a stronger effect in in men than women that's also interesting and maybe it's also sort of partially societal or what you know what people find attractive um so um yeah we we have some people on the team discussing that but um this is what we see in a sort of limited court and then we will need the other court with long follow-up to be able to look into that thanks it was great actually thank you i have one question now before we move to the to the chat based questions i think so i the question is very exciting that you presented and and i i liked some of the findings you showed now but in the moment when someone has found a partner um the partner also plays a big role in whether you you stay childless or not do you have the data to move this to a couple based analysis after people get married because otherwise i believe you will miss 50 percent of the the cases where a disease will have a negative impact on on um on a couple and therefore they stay childless so if it is the partner that is causing this to to phrase it shortly um then you're missing this if you if you're doing a single individual based study yeah that's true that's a good point i mean uh we can i mean we have a couple uh registry information so we can see once you have formed a couple how the disease pattern in this pose affects a sort of your chance to um or the couple chance to have children uh in a sense is similar to the sibling design we have been through the sibling comparison right there we're looking at how you having the disease impact your sibling chances of having children we can look at the same and see what's your supposed chances of having the disease impact your your chances to have children yeah i think in particular in the siblings i think this is also in particular relevant for the siblings because you said in the siblings the environment is identical that's true up to the partner right so this is the main thing that there's different in the siblings is maybe i mean some parts of their genetics or health but mainly their partners are are the different factors yes yeah it's an interesting it's an interesting idea we can for example you know think about controlling or just looking if if i mean they're clear assortative mating right so so you will on average sort of even if you have a mood disorder once you find the partners the partners on average is more likely to have mood disorders and so in a sense your chances to have children is not only new to your mood disorder but get boosted by the assortative mating uh and so so this kind of multiplicative effect on partners on your own that's so yeah that's all things that are very interesting to to study and now we have two more questions in the in the chat so thank you for the for the answer um there's one question about the registry data you showed in the beginning to which degree is the registry data you showed from Finland publicly available well it's it's true collaboration um it's it's it's very complicated i'm not gonna know with the beauty of GDPR but no it seemed very important that well this data will never be able to make it public available the closest things we can do is to try to generate synthetic data uh we tried some project on that i mean at least i i'm never been able to get anything particularly useful out of it but uh we have a governmental agency called fin data which has been that started few uh just a couple of years ago which does these things you can apply from everywhere in europe and you can get access to this data and they provide a secure computational system to do that so the the things in place are there if you don't not care about paperwork um and this specific data set that we put together uh through collaboration within the EU because GDPR we cannot move the day or or collaborate or have people accessing us at the EU uh then it's possible naturally we are always looking for for potential collaboration there's one more chat question from Leslie this is already an impressively comprehensive database of health information is there any additional information variables that you wish were tracked in it but there are some that are coming so all the infection diseases COVID vaccinations those are all things that have been added and then the really cool stuff come to the statistic Finland um data um it's another thesis complicated bureaucratically but you have um child classes so you can see like how this is in of your teacher or your your classmates impact yours you have a corporate structure in a sense so you know who is the who is in in in a in a company who is your boss uh and uh uh and now that's a sort of corporate structure works so you can imagine to do stuff on that direction um so there are a lot of cool things you can do when you're touching the economic data which are very rich but um again that's it's pushing a little bit the limit I think you know their concept of data minimization uh that need to be taken into account and we are thinking already on the edge of what um um I think the the the the the authorities are fine with uh because we cannot just simply put everything together and ask every question we want thank you for all the answers it was a very fruitful discussion around here um thank you very much under here also now for taking another half an hour for meeting our students and talking to them about uh career and research I I'm sure you have a lot to share and we are grateful that you joined summer school thank you very much for this talk thank you thank you for inviting me