 So, thank you Jessica for the nice introduction and thanks to the organizers for inviting me here which actually was quite a surprise because I'm not a neuroinformatician, I'm a biologist by training. So what you can expect me to do in the next 45 minutes is to actually add to your task load. I'm going to try to define the questions that can be asked from genetic and imaging data and the integration of those and the regions where improvement of data analysis and data sharing for example is still needed. So the question or the goal of my research, the ultimate goal is to improve the diagnosis and treatment of patients with psychiatric disorders. You can start this such research from many different angles, we've heard imaging etc. But a good place to start is also by genetics because as many of you might be aware, most of the psychiatric disorders that we know are very highly heritable. For example ADHD which is the main focus of my own research which has a heritability of 76% meaning that in an average patient about three quarters of the phenotype is explained by genetic factors and this is quite a lot. So the way we approach our research is by trying to identify the genetic factors that underlie these psychiatric disorders and on the other hand also trying to map the biology that leads from the genetic defect or genetic variation to the disease, to risk for disease. So what I want to show you is both sides of the coin by first going into gene finding and psychiatric disorders, show you what the models are and what the some of the findings and then move into pathways from genes to disease with different approaches that we can use here. So while being highly heritable, the genetics of these psychiatric disorders are highly complex and that has made gene finding very, very difficult. Highly complex meaning that in every patients or at least in most patients, many genes act simultaneously and if you look at different patients, different genes can be involved and we now think that in many of the psychiatric disorders we're dealing with involvement of more than 1,000 of our 23,000 genes and in every of these genes many different independent genetic variants can be involved in increasing your risk for disease. In addition to the genetic contribution, there's also an environmental contribution meaning that either environmental factors can act independently on disease risk or they can interact with genetic factors and increase disease risk in this way. Most research on the genetics of psychiatric disorders has been done on genetic variants that are common in the population so they are not restricted to the patient population they also are found in healthy individuals and all of them are of individually small effect size and that is also depicted here where we're dealing with the common variants down here that are common in the population and have a very low effect size. So the way to find these genetic variations is by doing genome-wide association studies and we've been doing these over the last let's say 10 years. What we learned from them is that neither of us or none of us can do these studies by themselves so data sharing in genetics in complex genetics is essential. Only by sharing our data and putting our data together in mega-analyses or meta-analyses we can reach the sample sizes that start to make sense and these are examples of the psychiatric G-WAS consortium which is certainly the most successful consortium in the field of psychiatric genetics where people are putting together sample sizes of nearly 30,000 or even more individuals to find genetic factors involved in psychiatric disease and for schizophrenia that is I guess the disorder that is most advanced in the first stage they found seven different loci to be involved there's now second paper in press which looked at 21,000 cases and finds 22 independent genome-wide significant loci and I've heard through the great vine there's already 35,000 patients now assembled and this gives you more than a hundred independent genome-wide significant loci with in many of these loci different independent factors contributing to to increasing disease risk well as I said my own specialty is ADHD and there we're not right there yet even with data sharing in the psychiatric G-WAS consortium we're not yet there we have five more than 5,000 cases more than 13,000 controls but we don't find a single genetic variant yet that can be called proven so what you see here is the way we plot these data where on the x-axis you have the different chromosomes and on the y-axis you have p-values for single variants being analyzed from these different chromosomes and what as well what we do in doing is we analyze more than a million genetic variants independently so we're doing more than a million tests that in of course increases multiple testing burden extremely so if we want to accept something as being genome-wide significant we only do so if it reaches a p-value of 5 times 10 to the minus 8 so that yeah makes you humble if you think about the numbers we're now even for ADHD we're now doing better we I think we're crossed the 10,000 cases line so we hope we find a single at least a single variant now being genome-wide significant even if we have 35,000 cases our analysis such analyses are still underpowered so people have been thinking about how to improve power for the studies and as I said what is generally done is that we analyze each genetic variant individual individually but knowing that there are different genetic variants in different patients at the same in the same gene contributing one might think that taking a different entity like a gene or even a genetic pathway and asking the question whether these are involved in disease might increase the power of your your studies and in my group we have been doing such studies Janita Braulten is a PhD student in my group who has been working with the data from the International Multicenter ADHD genetics or image study in which more than a thousand families with at least one ADHD affected individual were very strongly phenotyped or phenotyped in a very deep manner for ADHD related phenotype so what she did was that she took well established genetic pathways for ADHD like the dopamine pathway the serotonin pathway and also a pathway that we found to be involved in that I will come back to later the new red outgrowth pathway and she asked the question whether these together are involved in increasing your risk for ADHD in addition to taking the genes together she didn't look at ADHD as a category but she disassembled ADHD into inattentive symptoms and hyperactive symptoms and that is because modeling studies have shown that only part of the genes that cause inattention problems and hyperactivity overlap with each other so what she found indeed was that this joint group of genes of pathways was involved in increasing increasing your risk for hyperactive symptoms but not inattentive symptoms and this was confirmed in an analysis using a different radar in which we looked at quantitative stores for hyperactivity and inattentiveness and they're the same result was also found when she disassembled the different pathways she found that all of them contributed individually to to this association so all of them were involved and by integrating them or integrating the genes within the pathways we get more power to analyze genetic association so this was a study in 900 individuals whereas for the genome-wide association studies we need tens of thousands so can we use this information to improve the diagnostics of patients with a psychiatric disorder well that is a region of research where there's a lot of development at the moment we know that we can use these data to make personal risk profiles but for that we need much more genetic information from the from the different psychiatric disorders and also the possibility to distinguish between them so far even with with a hundred independent hits we can only explain 10 to 15 percent of the habitability of a psychiatric disorder like schizophrenia so we're not yet there but this is definitely a field where more research is required and that can lead to the use of genetics in in the clinic which is our ultimate goal so is there something else is there the possibility that other models underlie the occurrence of psychiatric disorders and people have been working on that trying to find whether there is Mendelian forms of psychiatric disease and there are families where you you could think that this is really the case and I've also looked into the possibility that there are genetic variants that have give you a higher risk individually so that you would need only need two or three perhaps instead of of like a hundred of them and the first results on this from this field where surely the rare copy number variation studies where people looked at copy number variants which are large genetic variants that encompass entire genes and you can have for example so normally you would have two copies of a gene and from these copy number variants you can either have one or zero or you can have three four five etc and from all the different psychiatric disorders there's evidence for all these disorders there's now evidence that these indeed there is an enrichment of rare copy number variants in psychiatric disease so if there is really copy number variants involved and people are also thinking that other types of more rare genetic variants in the genome human genome can be involved in psychiatric disorders and this is a field where next generation sequencing so the sequencing possibilities that we now get for sequencing the entire genetic information instead of genotyping just a million of the three billion base pairs for example is moving fast and helping us to find new genetic risk factors for psychiatric disease and there are now a number of papers out already on autism where people have been trying to make sense of next generation sequencing data which give you huge data sets so this is really big data science which show that about 10% of autism spectrum disorder cases might be explained by rare genetic variants that you can find with exome sequencing so exome sequencing is using or sequencing the information that is in the coding part of the human genome and the coding part of the human genome is three to five percent of the entire human genome and we concentrate on this part because we can understand it we can see well is there a protein formed is this protein non-functional because of a minor acid change is it expressed differently if we look at the entire genome there's so much so-called junk that we don't know what its function is that we cannot interpret the data at the current time so many of us are focusing on exomes well also in ADHD we have some nice examples that would lend themselves definitely for for next generation sequencing like for example look at this family where you have a clear segregation of ADHD through the family and where you would say well one gene one genetic defect should be enough to cause to cause such a segregation pattern but still we do this type of analysis we have now eight different families where we did exome sequencing of on three to five individuals but in none of them we find one single variant segregating through the through the family with with the disease so the models that we need to to analyze these data with are not the Mendelian models that are used in in monogenic disorders where only one single genetic defect is enough to cause a phenotype and the analysis of exome sequencing data or next generation sequencing data and complex diseases is a very active field of research at the moment and that there well the models for monogenic disorders won't fit so we have to find new models and everybody's struggling with that and we're we're setting up a unit to do this type of analysis if you're still looking for a post-op position and are interested in this please come and see we're very interested so in general when you think about identification of underlying genetic factors for psychiatric disorders I guess we're doing quite well we at least have the tools now in hand to do this type of analysis what we do with the data is a different thing we we the interpretation of data is really what is the bottleneck but we're moving we're moving further with regard to the mapping of the biology from gene to disease that is that is a different a totally different thing and this is something we definitely need if we want to understand what is going on in psychiatric diseases and want to develop treatments that really treat the causes and not only the symptoms of these diseases and I want to show you two approaches that are used to map pathways and one is what I call bioinformatics but this is really just in silico analysis of data that is already out there and that might be useful so what we did and this is work of a PhD student now a postdoc in my group here Pumans who has been looking who has been taking the genome wide association data that are out there in this case for ADHD and looked even though he didn't we didn't find genome wide significant data whether there is something that makes sense something that converges in these data so what he did was he selected genetic variants or genes that had a p value a study specific p value of 5 10 or 10 10 to the minus 5 sorry and and smaller and he looked at different softwares that tell you about enrichment of genetic functions in biological processes what he found during this analysis is that many of these softwares that you can use are very incomplete so that when you do a systematic literature review you end up with much more information and I understand that yesterday there was a workshop on data mining I think this is also a point where data mining or text mining is something that that can be improved to make these softwares much more efficient so he selected 85 genes from the different genome wide association studies and actually found that half of them well where could be involved in a process called the red outgrowth and he put this into a network and this is not a network as you see it in in in general network science but it just puts all the genes and or proteins that are involved in nerd outgrowth into a growth cone of a developing neuron you have different processes here like sensing the environment changing the cytoskeleton changing the extracellular matrix changing the the transcription and translation so the gene regulation within this this grown neuron to make the the the neurite grow out into a certain direction so for ADHD it gave us a new process to focus on new biological process to focus on on in understanding what ADHD actually is here to also build networks for for autism and these networks are based on much more data than we have for ADHD so there's exome data exome sequencing data for autism already around there's much more GWAS data etc so here what you found here was that three different processes could be found to be involved and enriched in the data but what what is most important is that with these data he found sort of hops within the networks and also between the networks that make excellent candidates for for new leads for treatment development and this is I think the strength of this network approach or this yeah enrichment approach that we can really find new targets to to to turn to when we think about innovation of treatment and this makes this for me makes this very interesting and and rather yeah a shorter term from understanding biology to going into treatment development than I would have thought myself so another approach to to understand what is going on when you have a genetic variant and you you and how it leads to to increase disease risk is Cognomics and Cognomics stands for Cognition Genomics and actually is a science that tries to map the genetic the effects of genetic variants on behavior through the through via the brain so when I talked to somebody this morning she said that geneticists tend to see the organism as a black box and I hope that this can assure you that we're not treating the organism or seeing the organism as a black box we're very well aware that there are things between the gene and the disease that are worth looking at so when you think about a genetic variant how it can affect your phenotype your behavioral phenotype it probably alters together with other genes alters gene function or cell function sorry and then hasn't why this way has a has an effect on the morphology and or the function of the brain and together with other genetic variants doing the same thing you end up with an effect on behavior so integrating these data towards disease symptoms so when you think about to explain these pathways one one place to look at is really brain morphology and brain function and that is what I want to show you in the next few slides in Nijmegen we set up the Cognomics program in which we now have data on more than 3,600 individuals and Cognomics really is an umbrella across a number of different studies including the brain imaging genetic study which is a study on healthy individuals from which we have mainly structural data of the brain then the impact and L study which is the the Dutch part of the international multi-center study for persistent ADHD so these are adults with ADHD and there is neuro image which is a study on child children with ADHD I want to show you one example from from impact where currently four PhD students are working and within the international study we are focusing on functional and structural neuro imaging in these patients with adult ADHD so what we do is we bring them in on two days for an excessive analysis both on the clinical phenotypes the neuro psychology imaging data we get from them and also genetic data Martina Hoffman is one of the PhD students former PhD students I have to say and she did a study using one of the um uh candidate genes for ADHD called nitric oxide synthase one or coding for nitric oxide synthase one which is known to have an effect on striatal activity and impulsivity so NOS one is a candidate for ADHD both from candidate gene based studies and from the genome wide association studies and it is known to be highly expressed in the striatum for ADHD the striatum is important structure it is involved and reward and impulsivity and is known to be hypoactivated um in patients with ADHD so Martina used a task called the modified monetary incentive tasks task also in the nuts and task um where people were in the scanner were told that um they had to click on the button as soon as they saw a target a wide circle but before they saw the target they were informed whether or not they can earn a reward so a monetary reward um by pressing as soon as possible and the she then looked at the contrast of reward over no reward what you see then is a robust activation of the ventros striatum and Martina could replicate earlier literature indeed showing that ADHD patients have a lower activation due to reward than healthy individuals have but when she looked at the effect of a genetic factor known to be involved in ADHD so NOS one where the SS genotype is the risk factor for ADHD she saw actually the the opposite so those with a with the SS genotype actually had a stronger activity in the striatum than those without and this was the case both in the affected individuals and in the healthy controls so we were quite puzzled by that but then we used another task for impulsivity the delay discounting task which is a behavioral task and we analyzed it in a behavioral way um and there we should we saw actually the same so here the more impulsive you are the lower the bar is that you see and here especially the the patients with the risk factor had a much lower activity than those without the risk factor the same was true also for the controls but this didn't reach significance so what this told us was that NOS one was not actually tagging ADHD but was tagging impulsivity and this very well correlates or is in concordance with a paper showing that this NOS one variant is actually not only a risk factor for ADHD but for different impulsive behaviors in humans so it tells us that the the effects of genes can be quite indirect affecting traits that feed into the disease rather than the disease itself so this was an example from the function of a brain unit we can also look at morphology of the brain unit and I already told you about the brain imaging genetic study where we now have 2,500 individuals included these are healthy individuals most of them are healthy young adults most of them are also students from our own university who come to do our experiments and what we find there is indeed that we can find effects on specific brain regions due to genetic risk factors for disease and this is a study of BDNF which is a risk factor for depression so not ADHD in this case and we were able to show that this has an effect on the anterior cingulate cortex volume in the healthy individuals but it only has this effect when there is also an environmental factor present that also is a risk factor for for depression namely childhood adversity so the people in big are scanned at different field strength some of them are scanned at 1.5 Tesla others at 3 Tesla and we were able to replicate our findings both at 1.5 and 3 Tesla so also on the on-brain structure you see effects of individual genes that feed into disease risk in big we can do candidate gene studies but we also want to do genome wide studies on looking at effects of genes on brain structure and if you want to do that you need international networks again and for that reason in I think it was in 2010 Paul Thompson and Nick Martin founded the Enhancing Neuroimaging Genetics through Meta Analysis Consortium or the Enigma Consortium we became one of the first members and part of the central support group of Enigma and the first Enigma project that we did together with a lot of different people was to ask the question which genes contribute to hippocampus volume and measures different measures of total brain volume so the way Enigma works is not by asking everybody to give us their data and doing the analysis across the entire data we're using a crowdsourcing method in Enigma to get to our results so what we do is that the Enigma support team writes protocols and tests these protocols and these protocols are for both the segmentation of the brain and this uses things like free surfer and episode first but also for the imputation of genetic data and how to do the association studies then every group can perform their analysis on their own sample by themselves at their own centers so no need to share the data which is often a problem because of ethical approval etc and the Enigma support team only collects the summary data does an extensive QC on them and then performs meta analysis the first Enigma study in this way brought together more than 13,000 data on more than 13,000 samples from I think up to 20 different groups and we indeed found effects of genetic variants on brain structure so we found one genetic variant being significant so passing the threshold of five times ten to the minus eight for hippocampus volume and one for intracranium volume we replicated our data in another consortium the charge consortium which added another 10,000 individuals to this analysis and we ended up with p values of six times ten to the minus 16th which is very nice interestingly the genetic variant associated with intracranial volume was also associated with IQ so there is a link between brain volume again a link between brain volume and behavior and that is the most interesting thing for me in this analysis well Enigma this is to show you that Enigma not only wants to do the analysis they also want to have people use the analysis so we have a website where we have all the protocols that you can use and we also have a tool where you can look up the genetic effect so you can put in either a gene name or a genetic variant name and it will tell you whether what the p value was for association with hippocampus or intracranial volume we're now putting together a new project which will not only look at hippocampus and intracranial volume but will look at all the different subcortical brain regions and this currently has a total of 18,000 participants or samples involved as i said the interesting thing for me to look at is really whether these genes that we found find for brain structure also have an effect on behavior and behavioral abnormalities in this case like that we find them in psychiatric disease and in this to answer this question we're making use of the strength of Enigma and trying to combine it with the strength of another consortium the psychiatric G was consortium that I told you about before so we're trying to meta-analyze and to link the data from PTC and Enigma and trying to find out whether there's overlap between the genes for hippocampus volume for example and schizophrenia and this is currently ongoing and we're going to start the analysis within the next month so if I have to put out say where data analysis and data handling can still be improved then it is definitely in a region where we combine data on multiple brain regions with data on multiple genes so what I've been showing you is that we can use candidate brain regions and integrate and data on multiple genes with them we also can do it the other way around we can use a candidate gene and look at multiple brain regions but we cannot yet do this combination because the dimensionality of the data really gets prohibiting so this is an area where there will be a lot of I foresee a lot of development needed in a method development so looking back what can we say about where we are with in terms of our ultimate goal of research well we have still have to go quite some way to identify all the underlying genetic factors that are involved in psychiatric diseases we need to optimize multivariate methods or use multivariate methods to maximize the power of our analyses we need to generate personal risk profiles to be able to use them in the diagnosis and we have to make next generation sequencing work for complex disorders on the side of the mapping of the biology from gene to disease we need more complete databases for in silico analysis so the bioinformatics tools that are around their pathway software is too incomplete currently to answer our questions also something that I haven't shown you in this presentation is that we need affordable model systems to really where we don't have any data yet we need to generate data and we can do that only in model systems that are affordable fast and flexible and my own group is currently doing analyses of ADHD and fruit flies trozophila and this actually works quite fine so if you knock out a gene that gives humans ADHD or that increases risk for ADHD in humans it will also make your fly hyperactive last but not least multivariate analysis methods for cognomics research is also something that is really needed so I was faster than I thought I was because this is already you didn't need your five minutes slightly this is where I want to thank a number of people of course the number the list of people who are involved in these large-scale projects is is huge so I'm putting here those of the enigma support group the cognomics people and the pgc participants that are play a role in the analysis of both the enigma and the pgc data together thank you very much for your attention so thanks for an interesting talk I wonder about these these different psychiatric disorders do you expect that like some genes are involved in many of them so that can give you some help in like more statistical power in your analysis yeah yeah so the what the psychiatric g was consortium has also done is is a quite agnostic analysis where they just put everything together that that had a disease so everybody whether they had schizophrenia or ADHD or autism they put them all into the patient group and compare them to controls and indeed in these analyses a number of genes pop up that give you an intrinsic susceptibility to psychiatric disease of course that makes your power that improves your power to analysis to perform the analysis but it reduces your your ability to use genetics in diagnosis because the specificity is just lacking so what we see more and more is that the clinical definitions of psychiatric disease are not very well related to biology so what I foresee is that within five years we will have to look at at these psychiatric definitions again and try to come up with a categorization that is more biology based yeah hi thanks for the talk that was great and I've got two questions I mean one is on the data sharing aspect I mean it's great to do a meta analysis but it would be even better to be able to share the data and you mentioned that there are ethical problems and but it is also kind of non-ethical to not try to share the data in the sense that you with all the data around we could you know advance much quicker in those disease identifications so what do you think are the path the path for for actually being able to with you know the appropriate informed concerns and think that that would be able to actually share the data as as almost as a moral obligation somehow so that was my first question and the second one is there are a couple of multivariate analysis on you know brain versus genetic multivariate analysis in the literature why are they used more is that your software problem is that interpretation problem okay so with regard to to the data sharing I I agree entirely that sharing of primary data will increase the possibilities that you can of the ways in which you can use these data so already with with enigma and pgc now we are definitely limited in what we can do what the approaches we what approaches we can use to combine these data with each other so that there is definitely a point in improving this and getting people to really share the primary data absolutely absolutely in terms of getting the field forward this was the best way because that makes the threshold for people to share or to participate in these analyses much much lower but I agree definitely that we need to to come to a model where people really share and there are also I know for example at mrn there there are preparing software in which you don't really need to place your data at a different site but analyses can be done remotely on your data so that is another model that might also work for for this type of data sharing keeping the data in the place where you want them or where you need to have them and having other people trying to do data and analysis on them with regard to the multivariate analysis you are right there is there is a number of methods now coming available to do this type of analysis and indeed the interpretation of the data is much more of a problem than finding the new methods but still this this is still a problem that we need to solve yeah it's a very nice talk thank you very much with regard to the disease description do you think there is a possibility that we could throw away dsm 5 or dsm 4 and see if we could find a way where the 29 35 50 000 patients could express their own disease so starting from the data instead of the categorizations it's a very good point I think that is the way to go definitely on the other hand the the input data is is a little bit of a problem because they were collected using these categories so you don't see the entire spectrum probably that that makes it a little bit more difficult but it's a good way to go to to start from the data yeah yeah more task load for the neuroinformaticians