 Thanks, Howard, very much. Good afternoon, everyone. My name is Joanne Ella Morales. For those of you who don't know me, I'm a program director in the division of genomic medicine at the NHGRI. It is my pleasure to welcome you back to day two of the workshop. And I'm really excited to introduce session three to you, which is focused on the application of multiomics to observational studies. We have four terrific presentations lined up today, and I'm very much looking forward to them. But first, let me say a few housekeeping items to sort of reiterate some of the things Howard mentioned just now. Each presenter will have 10 minutes to give their presentation and then five minutes of to answer questions. And for the participants, if you do have questions, feel free to add them to the chat, but I will also be checking for raised hands. And as Howard said, when there is one minute left, I will chime in and let the speaker know. And I think that's all I've got. So with that, let's get into it. And so our first speaker is Dr. Nathan Price. Dr. Price is CEO of Juan Javity, a division of Thorn Health Tech. He is also a professor at the Institute for Systems Biology where he and Lee Hood co-direct the Hood Price Lab for Systems Biomedicine. His research interests are in computational biology, systems medicine, scientific wellness, translational bioinformatics and biological networks. Dr. Price will be speaking to us today about using polygenic risk scores combined with multiomics data to provide insights into prodromal disease and prevention. Over to you, Dr. Price. Great. Well, thanks so much, John Allen. It's a pleasure to be with everyone today. As all the stuff I'll talk about today was work that was done at ISB, as was recently mentioned, I've moved my full-time locus to Juan Javity recently, but remain on leave at ISB. So I was asked to talk about polygenic risk scores and how they interface with multiomics. So that's what I'm going to dive into today. And I'm going to try in the next 10 minutes to go through two brief stories. So this started for us with something called the Pioneer 100 Project, which was 100 people where we started doing multiomics data of various different types. This involved things like whole genome sequencing, measuring about 1200 different analytes out of the blood, doing measurements of the gut microbiome. These were all done on three-month intervals during this period, looking at wearable devices, trackers, Fitbits, things of that nature. The idea behind something that we called scientific wellness was to try to take this information, make it actionable, relevant for people in their lives, so hopefully they would go on a journey with us over time. And then to develop through this, creating personal dense dynamic data clouds or deep phenotyping, the kind of thing that we're all talking about here for multiomics longitudinally. And the goal in this study was not to do it around a particular disease, but to try to understand what happens in a general population before maybe a lot of them have gotten into a disease state. Following that, we had started a company which now had a business that's called Aerevale, and we were able to generate these kind of data over a period of about four years on nearly 5,000 people, demographics here, and I won't go through that in detail in this 10 minutes. So what have we learned so far from analyzing these data? We have a slew of papers on this. I'm just going to talk about two of them today that are related to polygenic risk scores. So this was work led primarily by Michael Weinberg, who's a postdoc in my group, as well as with contributions from Laura Heath, John Earls, and Andrew Majus, looking at how deep phenotyping can help us look at the manifestation of genetic risk in the body as interpreted from PRSs. So the basic idea here is pretty simple. This is all published in a paper in PNAS last year. People want to look at the details. As individuals, we all carry variants affecting our predisposition for different traits. Those genetic variants, of course, a genetic variant doesn't mean much in a test tube. It means something in your body must result in some sort of altered biological function, some way that it carries out whatever those differential risk factors are. And if we do brought enough measures, there's a reasonable chance that we can capture some of what those genetic variants do even way in advance of any disease symptoms by looking at multiomics data. So metabolites, proteins, clinical labs for the main elements we were looking at here, but basically do we see strong correlations with those in a general population. So for this study we went through, Michael did a great amount of work on this, basically went through every published GWAS study that was available at the time. We had some quality control cutoffs and we identified what we thought were 54 GWAS that were well powered, often that had multiple studies behind them that identified genes that were associated with all of the different disease or trait categories that are shown here. And in essence, then built the polygenic score basically just doing a summation across the, across the, the risk variants and building a polygenic scores for each of those, or taking ones that had already been built. And then the novel part here was to then use this in the context of these 5000 people that we had amassed to see, were there correlations that we could identify in the proteins metabolites or clinical measures associated with these traits. So, by doing that we were able to show that we could do analytes that correlated with these polygenic risk scores as won't be too surprising hopefully clinical labs turned out to be the most correlated with those scores that makes sense we've curated clinical labs because of their relationship to disease, but also a lot of the knowns from the metabolites and proteins popped up as being associated with these, and I'll just give in the minute or so I have for this story. Just some simple examples so this is just taken from the top of the list alphabetically there 756 associations that we found for altered either proteins or metabolites or clinical measures associated with these polygenic risk scores. Quite interesting, I think ALS had quite an interesting one in the sense that it showed that people who had really high genetic risk for ALS tended to have higher amounts of was correlated with higher amounts of omega threes and lower amounts of omega sixes. That's some that's interesting because that's kind of the opposite of what we think of typically as omega threes being healthy omega six is being unhealthy. We did find after that a paper in an ALS mouse model that showed that omega threes actually accelerated the disease and omega six I delayed it delayed neurodegeneration in that model. So this is indicative of the fact that if we look at polygenic risk scores for a population. There can be some personalization to what you may or may not care about or the degree to which you would care about something based on your particular risk profile. It also pulled up some really interesting analysis proteins in the sense that, for example coronary artery disease. There's only one protein out of all the 400 that we looked at that was correlated with the polygenic risk score for coronary artery disease in asymptomatic population turned out that was PCS K9 which of course the biggest blockbuster drug there. Asthma was the same thing we found one protein that was associated with polygenic risk of asthma. That's under apparently in clinical development with four different clinical trials on going on it as a potential target for treatment. So I think you get a pretty interesting signal for these early stages when you look at asymptomatic population because you haven't had yet had all the response to disease. So implications for the future from this first part is that depending on a person's individual genetic profile you can provide a prioritization of choices that they might make. And you can map the most genetically at risk people for disease and tailor approaches to trying to deal with the ways that that may manifest before they get disease. So the second topic I want to go through in the three or so minutes that are remaining is around the notion that you can also look at this multiomic deep phenotyping data to evaluate what are the effects on lifestyle change. And so here we have a polygenic risk score turns out that you can predict for LDL cholesterol you can predict it pretty well from the genome. So that's not you know it doesn't know if you're vegetarian or you know you like to eat cheeseburgers every day or whatever it is, but basically you can get a pretty strong genetic signal for LDL. And today, because we don't really use genetics much in clinic outside of cancers and a few other little pockets, we treat these individuals as the same. And so the question is if you have elevated LDL cholesterol, and you have it there because it's predicted by genetics versus it's there because of lifestyle, is there a difference. And what we found, which I found really quite fascinating was that we just been people into five buckets. And in essence what we saw is that if there was a big gap, if their genetics predicted they would be low but they were high. When they went through a lifestyle intervention they were able to lower their LDL cholesterol successfully for 40% of people. So that's a big number. Now, the upper 40% of people were not able to lower their LDL cholesterol in other words their genetics predicted they'd be high they were high when they tried to move it down by lifestyle. They in fact could not do that this has implications which we could get into and we'll have a big paper coming out on it next year hopefully around statin usage and things like that. We'll get more into that later, but essentially what this shows is that there's a big difference in individuals based on what their genetics predict and how they respond then to trying to intervene. So another element here then is just to give a second example is here around HDL. Same thing this is a genetic prediction it shows that again it's very strongly. A strong genetic component to it, which is shown here. And if you ask the same question are people able to change their HDL. What we saw was that for most people they were in fact not able to change it if their genetics were predicting that their HDL was low, they had trouble raising it. But if their genetics predicted that they would be high. And this is all normalized to the beginning start value. So again that gap makes a big difference. So yep implications for the future genetics are not destiny, but they quantitatively affect the outcome for lifestyle interventions. And we can design health strategies for people that highlight the areas where the most progress is likely. We can create a delta around every different clinical measure that a person has where they have their genome working with them rather than against them, and you can map that out for someone in their entirety, based on this kind of data so with that, I think everyone that did the work along the way. And thanks to Lee also who's been a great partner throughout all this and that is nine minutes and 58 seconds thank you. I'm done. Thank you Dr price for that fantastic presentation. I'm looking to see if there are any questions and see any raised hands, Neil has his hand up. Thank you. Go ahead Neil. It's really interesting. So, I wondered whether you're in these analyses, you're able to see the contribution to ancestry in any of these, or things differed by different ancestral groups or populations. Yeah, great question. And the answer is in most of these cases is yes, right that it does make a difference. Our population was, you know, pretty heavily Caucasian and then Asian, and then it's fairly smaller and the others just from the way it was a self selected group coming in. We've looked at a number of different items like this and typically, if you incorporate something like a genetic sequence and you just take the, you know that the first, you know, and principle components and map genetics in that way you do find that there are differences and associations across a lot of these traits. And it's a huge issue obviously, because I'm a big believer that will view it as archaic as interpreting blood in the absence of genetics pretty soon. And I'm, you know, pushing hard as many of the people on this call are in the healthcare system for trying to get that adoption. And that will be a huge element that when everyone's genetics is in there, and that will take care of ethnicity to some degree, right, because that's kind of a subsection of genetics. But we really should be interpreting all of that data in the context of what a person genome says and there's a huge amount of work to do to make that better and more equitable as well. Thanks. I see Nancy's hand up. Oops, Nancy, if you're trying to speak, we cannot hear you. You're on mute. You know, I knew you are was asking me to unmute in two places and I was on both. And so the concept of this delta I think is really interesting. The genetic component versus the non genetic component, but I think it is relevant to note that for some biomarkers LDL cholesterol is a great example of one there's there's already data. Clear data indicating that both the genetic and the non genetic components are predictive of the outcomes the reason that we use these that that is a biomarker. But but I think there will be there are plenty of biomarkers where the genetics really is noise for how we need to use the biomarker that the delta is the better biomarker. And I think it in contrast there's at least a few, especially of the newer metabolomic biomarkers that turn out to be a single DNA variant in a single protein that is that that is the sort of biomarker, which we probably would rather directly just with the genetics, rather than than the metabolomics. But, and so I, I'm thinking we need the genetics in part to do a better job with some of these biomarkers where the genetics is creating a better chemistry, mean differences, differences in the variances for a biomarker where all of the genetics is noise for how we use the biomarker and think of something like the way we used to stat and see. And it's heritability of 4040 to 60% based on family studies so it's a. I do think people have not internalized how heritable biomarkers are, how much of that is sometimes predictive of the outcomes it's just, you know, it may be mostly the genetics that is predictive and maybe we ought to just use the genetics then, but there are definitely biomarkers that we, I think we can make better by focusing on that delta. Yeah Nancy I agree with everything you said very much because what we, you know what we, and I don't want to imply, we've done a lot of scientists have looked at this but I just mean if you walk into your general clinic right they don't. They don't use this at all right now, and typically right maybe a few pockets, you know you go to, you know, go to Mike Snyder's fancy clinic you know you'll get this but you know but in most places you don't you don't get that. And so, but it's true that we should interpret you know in my opinion we, you know we should do an evaluation at least of an interpretation of every single clinical lab we measure of what the genetic component is what's the non genetic. And you're exactly right for some, it won't make that much difference the genetics not that big a factor we can use it like we have. The genetics might be almost the whole ballgame or it's noise and we start getting a much tighter much like the LDL cholesterol the reason I do the lifestyle examples is there's a tunnel to literature and it says well you can kind of change it and you kind of can't and it's. You know it's and there's some people it does and some and it's like it's this murky thing, and you put genetics on it's like well you can actually predict who will and who won't. Pretty strong prediction. You know it's not perfect you know there's still some individuality but you saw the numbers 40% at the bottom for LDL can move it really nicely 40% at the top. It's really rare that they can do it and so you know there's a lot of there's a lot of elements in there and we should just be looking at that for all the biomarkers systematically that we use in medicine because it's a big it's a big factor. Thanks Dr price I'm afraid we've run out of time so we'll have now to move on. And I see that there were hands that raise that weren't we didn't we didn't get a chance to get to if you don't mind putting your question on the chat. So that Dr price could could respond I see a few other questions also on the chat. Okay, so let's move on to our second speaker that's Dr Kerry Nado. She is the legacy foundation and doubt professor of medicine and pediatrics and director of the Sean Parker Center for allergy and asthma research at Stanford University. She is also the section chief and asthma and allergy and the pulmonary allergy and critical care division at Stanford. Dr Nado's interest lies in understanding how environmental and immune and genetic factors affect allergies immune tolerance and asthma, and today she's will be speaking to us about understanding how multi omics tracks environmental influences. Floor is yours Dr Nado. Thank you so much and thank you to Howard and Judy for inviting me here today and the meeting organizers and happy Juneteenth day everyone it's really a pleasure to be here. I feel very lucky to be among you because our fields merge in so many ways and we've learned from each other and I also have been working with some of our colleagues at Stanford, including Howard and Mike who's previously given a talk here today so a lot of collaborations that we're here today to be able to talk about how multi omics track environmental influences. So today I'll talk about measuring exposures and why we do that the tools to perform multi omics as many of you are aware and then some examples of how we've used an approach to a sampling observational cohort for exposure to pollution which is a cohort in the Central Valley, and then an exposure to diet which is a twin cohort here with models I get twins, and then I'll just add some conclusions. So importantly, is why do we monitor exposures why is this relevant well, of course, as all of you know in the context of climate changes is really critical to understand how we monitor exposures in relationship to genetic risk factors and genetic and epigenetics, and with a lot of the tools and multi omics now to understand to what degree DNA and cellular changes occur after climate change issues like air pollution increase allergens extreme heat severe weather environmental degradation, living environmental issues as we spoke about being Juneteenth but also environmental justice issues with many people of color, unfortunately, bearing the brunt of climate change and being targeted with zoning laws in those areas that have the most toxic emissions. And also changes in vector ecology. And so understanding genetics of humans but also other vectors and potentially animals is going to be important water and food supply impacts and then water quality impacts. And for me as an immunologist, I find it very instructive to be able to put this paradigm together to look at different environmental factors and how they affect the development of immune tolerance because as an infant. We have been exposed to many different conditions and throughout the world infants are exposed to different conditions but you can see here, the amplitude of signals that one could potentially assay for any genetic risk factors with epigenetics and with different exposures that are monitored in the environment. And why is it important to look at the immune system well, from my perspective, it can really help understand prevention and treatments as Dr price just mentioned in some of the adult diseases. For me understanding the development of the immune system over time the immune system has both acute and chronic measures there are cells that turn over every six hours like neutrophils, and there are cells that can live up to 100 years like T memory cells. So you can look at these cells and infer them to what degree and exposure had an effect on their DNA on other aspects of the cellular physiological mechanisms, but that you can attribute it based on the timing of the half lives of these cells and based on the timing of exposure. For me as a pediatrician and as an adult doctor. I want to understand how this moves forward within the lifespan of any individual because it's not just about one exposure it's about chronic exposures and repeat exposures over time. So again, for example, within our approach we look at detrimental environmental exposures like pollution, tobacco smoke pesticides diet, how that affects the immune system in particular, and then how that potentially leads to disorders like asthma, allergy, cardiovascular disease, obesity, and cognition. So getting down the weeds a little bit more, we know from mal studies and from human studies that for example diesel exhaust pollution DEP activates alveolar macrophage for example the lung depicted here with epithelial barrier defects and activation through to like receptors and alarm and signaling, you see a pro inflammatory cascade of inflammatory zones, as well as T cell differentiation to a hyper inflammatory pathway, and that can also occur through arrow hydrocarbon receptor directly through volatile organic compounds that are coming through the lung as well as through the skin for example externally. So when we understood some of these mechanisms that were coming out of animals, we wanted to understand whether or not the same hypotheses could be attributed humans. And so in the Central Valley in California, which is one of the poorest areas of the country. We decided to focus because there's an environmental justice issue here there are many people that are of color and that are exposed to high amounts of air pollution from agriculture from industry but then also because it runs right down the center of our state in California and so there's basically this negative suction of all the air pollution from other areas of California. Coming into the Central Valley and especially in this area of Central Valley which is called Fresno. So we gave about 1000 families smartphones that we can then track them via individual estimate exposures to be able to understand to what extent they've all been exposed to particulate matter or in this case, volatile organic compounds that are part of partially combusted diesel exhaust. And so with that we can draw these kind of maps to look at exposures and we've been doing this now for over 10 years you can get exposures for a week or three months or longer. So we have these 1000 individuals we go down to the community we ask them what they need it's really important to make sure we have the perspective of the first before doing these studies. We look at spirometry and questionnaires like blood and saliva and urine samples we do this in collaboration with our colleagues at Berkeley who really have done an incredible job at understanding individual exposure estimates thanks to about nine collection sites of pollution on the rooftop so Fresno. So we do individual estimate air pollution we also collect pb mcs and plasma, and then we perform high dimensional omics site top proteomics epigenetics abseq, taxic and single cell transcriptomics. The pollutants that are measured are measuring a very standardized validated method. So we collect ozone nitrogen dioxide particulate matter 2.5 microns or 10 microns 2.5 is important because that can get into the LVL sulfur dioxide carbon dioxide lead and other heavy metals, but then importantly we also track wildfire smoke because the Central Valley now unfortunately because of your 70 being exposed to wildfires giving drought climate change. The Central Valley is now exposed to about 140 days a year of wildfire smoke. So we've been able to track wildfire smoke exposure as well wildfire smoke about 10 times as toxic as typical air pollution. So we apply site top or time flight mass spec to the sample so that we can get high dimensionality of over 47 features at once. At the same time in one individual blood sample. So this is a publication that came out. Last year showing that with the criteria air pollutants we can get a good fingerprint of what exposures that are associated with different immune markers this is an example here to show the variable importance in projection score systems for each of these immune factors and different criteria pollutants and you can see not all criteria pollutants are associated with the same immune markers and I think that's critical but in any one individual. They are being unfortunately breathed in at the same time. And here's another way to look at it we also looked at CPG site methylation features using pyro sequencing. And here in these loci, I'll four I'll 10 gamma box be three in these CPG sites you can see the differential methylation that occurs in these loci and we the reason we focused on these is because I'll four is associated more with allergenic profiles and if you're a gamma with potential to fight infections and I'll have to be free with more regulatory pathways. So we then looked at protein expose it's important to also look at DNA but as all of us spoke about also proteins. And so with the same individuals we collect plasma and run that the for example, different proteomic platforms and we looked at CRP D dimers and we looked at IL-18 IL-1 beta for the plasma stone pathway. And you can see here that in the individuals when they're not exposed to wildfires you kind of get a sense for what a baseline. They have variability for, but then when a wildfire occurs during that exposure and even after two days of exposure we can start seeing signals this is at about three to four days of exposure of a wildfire. But within a mile of the wildfire this is about 100 miles away from the wildfire, because the plumes horizontally affect quite a large number of people in California. So I'd like to move forward with the other cohort that we have so that was our cohort in Central Valley, which is part of an observational study that we've been doing for about 10 years. Thank you. We've been also looking at models I got of twins. And this is work that we've been doing with Mike Snyder. And this has been published in cell aging but importantly we looked at metabolomics. And what was really fascinating is we used a unsupervised approach, and we collected individuals from zero to 80 years old, looked at fuzzy semen clustering and identified over 770 metabolite abundancies. And what's really fascinating is that what we detected again through agnostic assessments is that we could detect caffeine and black pepper with increasing age, with all the dietary components of the individuals and what was more associated with the correlation of these metabolic profiles was twinship i.e. genetics so talk about genetic variants and this factor is probably important compared to just age alone. So, I'll quickly wrap up we're doing some more work on these twins with the taxi, thanks to work with Howard Chang. And so our conclusion and next steps are to make sure that we understand more about exposure analysis use better technologies that we can look at the exposome. Mike Snyder and others around the world are looking at this more carefully to look holistically at any given person for any personalized approach to their exposome. And we need to also look at multiple exposures anyone time to understand the interaction the exposures on the genetic risk factors of that individual cells like the genetics and then looking at longitudinal exposures over the lifespan so it's also important so I'll stop there. And if there are questions I'm happy to answer them. Thank you. Thank you Dr. Nado. And thank you for helping us focus on issues of environmental justice that's really terrific. Okay, I see Judy has her hand raised go ahead. A great talk carry. How systemic do you think asthma is. And so you talked about the environmental components in air pollutants and so forth. And the rationale for my question is, like how useful is the blood for looking at rare subtypes like hematopoietic stem cells pre mass cells that may be clear in asthma. Yeah. So my question there's been a lot of studies on this we've also done work with in BAL to look to what extent BAL reflects the blood and vice versa. There are an array of blood markers that one can now find associate with asthma. And then what we have found is with wildfires there's like a 50% increase in the risk of asthma for people that already have it. And then a lot of the pollution exposures increase, even just the induction of asthma and we can actually follow that in the blood. So luckily, because of chemokines, and because of some migration, we are able to capture these unique features of asthma in the blood. Great question. I see Aaron's hand. Go ahead, Aaron. Oh, sorry, quick, quick question. That was terrific. Where, where is the field at with being able to validate some of this large scale exposure data with some of the new methodologies. Is there work to be done there? Are there, you know, the existing standards? You know, just give me a sense of like how far do we need to go. I'm so glad you asked, especially with this symposium and hopefully a white paper coming out from it to be able to put some possible suggestions for research. I do believe we are, for some respects there, but I think there is more to do, especially understanding the tools that we've discussed today and how to apply them best to anyone individual under any given circumstance and context of their exposure. I think we're just sort of scratching the surface, but I'd love to see, like I said, looking at exposures over the lifespan and looking at multiple exposure models because anyone individual isn't just exposed. So let's say PM 2.5. So, so to answer your question, you know, people like Carol Over and others are looking at allergies and asthma, and we're doing a lot of work with her with her new methyl arrays. I think getting this to big populations and sharing samples so that that are well characterized so that we can really understand us at the level of the genomics that needs to be done is key. So I'm hoping that as we expand our cohorts, we can get there. Does that make sense? Yeah, thanks. I see a question in the chat from David Craig. Do you see or suspect differences in privacy concerns by different groups and joining studies, collecting exposures via electronic approaches? I think another whole symposium as well. Those are very excellent questions. I think I'm also on the IRB. And so we discussed this at length. And especially when we are working with underprivileged and under resourced communities and there are many sensitivities. And we approach the community here when we talk to the Central Valley groups. I think there's an appreciation of the science if you work directly with them. Of course, we also have to maintain confidentiality and make sure that nothing that we do could then be linked back to privacy concerns to individuals. This is a longer conversation, but to the extent that we can make sure that we ensure protections that we explain to individuals about any potential risk that's important. But I think especially with epigenetic studies, I have not found that we can directly link it back to anyone individual, but that's always hypothetically possible. But I do believe that it's possible in this day and age and we need to understand this more. So hopefully there will be more compliance measurements around studying people electronically and through whole populations. Thank you very much. It is now time to move on to the third speaker, but I did want to point out there's one one question from Nancy in the chat. So Dr. Nadeau, if you can look at that question and offer an answer, that would be really helpful. Thank you. All right, let's move on to our third speaker for this session, and that is Dr. Karine Engelman. She is a professor of population health sciences and director of graduate programs at the University of Wisconsin and Madison. Her research focuses on the study design and data analysis of large scale omics, demographic socioeconomic behavioral physiological environmental factors of complex diseases including biomarkers and preclinical traits related to Alzheimer's disease and also vitamin D deficiency. Welcome, Dr. Engelman. Thank you. Thank you so much for the invitation to talk to you today. It's not allowing. Okay. So I wanted to start by briefly describing our cohorts and the multi omic data we have available. I work with two cohorts that Wisconsin registry for Alzheimer's prevention and ongoing longitudinal study of initially cognitively asymptomatic individuals with phenotypic data from over 20 years of follow up. And also with the Wisconsin Alzheimer's disease research center, a longitudinal study of cognitively healthy individuals, as well as those with mild cognitive impairment and Alzheimer's disease. In addition to the phenotypic data, we have genomic data as well as longitudinal metabolomic and now proteomic data in blood and cerebral spinal fluid. In an initial analysis, my former graduate student virtue Darst constructed an interomic network using pairwise correlations of the residuals of each type of omic data shown on the left after adjusting for relevant confounders. She then focused on the interomic correlations shown in this figure. Since we were primarily interested in gene metabolite relationships that could influence Alzheimer's risk factors. She found that no genes shown in blue were directly correlated with Alzheimer's risk factors shown in yellow, but several genes were correlated with plasma metabolites that were in turn correlated with Alzheimer's risk factors. The results of this research were a cluster of cerebral spinal fluid metabolites shown in green in the lower, in the lower right. And they were associated with cerebral spinal fluid biomarkers for Alzheimer's pathology and have since been replicated in an independent sample. The community analysis revealed that a cluster of genes plasma metabolites and Alzheimer's risk factors related to cardiovascular disease and diabetes that are located in the center left of the figure. This research just scratched the surface of what can be done with these types of data and opened up many additional research hypotheses to be tested. So I, for my talk, I was asked to highlight the barriers and opportunities in the longitudinal space and to draw out points that are generalizable to other owns and diseases. So the first barrier that I see is a dearth of studies with longitudinal omics data over many time points to see timing and trajectories versus random variation. We definitely need studies with at least four time points and Mike Snyder's work, yes, that was shown yesterday suggested five time points to establish the timing and trajectories longitudinal metabolomics data are lacking for most, if not all diseases, but I would imagine this is the case for other types of omics data. Another barrier is that most studies use a case control study design. So you can't be sure if changes in the ohms, other than the genome, are the cause or result of the disease process, and that's shown in the right in this figure here. And this leads me to the first opportunity in the longitudinal space. And this is to use longitudinal omics data in pre clinical individuals to establish the timing and trajectory of pathologic changes. The barriers from the previous slide can be addressed by longitudinally studying individuals who are at risk for a disease, due to parental history, for example, have not yet developed the disease. The studies of Alzheimer's disease that I'm involved in we have been following a cohort of adult children enriched for a parental diagnosis of Alzheimer's disease for nearly 20 years. We measure longitudinal biomarkers of the pathology of Alzheimer's, as well as cognitive function. By doing this we can establish the timing and trajectory of owns associated with Alzheimer's pathology and disease. For example, in work by a former graduate student Danny panured. He examined the cerebral spinal fluid proteome to determine proteins that were found in different levels in asymptomatic individuals with no Alzheimer's pathology in the brain, that's the a negative T negative in this figure. Versus those with the earliest signs of pathology amyloid plaques that a positive T negative in this figure versus those with more extensive pathology amyloid plaques and Tau tangles the a positive T positive. 61 proteins were significantly associated with Alzheimer's pathology after Von Froni correction for multiple testing. The patterns for the top 10 proteins are shown in this figure with some proteins steadily increasing in the bottom, for example, in the bottom right, or decreasing in the bottom left with Alzheimer's pathology from the early stages, and some not increasing until later in the pathology when the Tau tangles had foreign. All of the examples in the top row show this pattern. The take home message here is that you can learn a lot about changes that are happening in the body early in the disease process by measuring omics in individuals prior to clinical diagnosis of disease. Determining the timing of omic changes over the course of disease of the disease process is important to distinguish omics that change early, and thus can be markers of disease and informed therapeutic targets and lifestyle modifications. Versus omics that change later and can be used as diagnostic or prognostic prognostic markers. And in this work by my graduate student Eva facility. It shows a very simple example for two snips within one gene ApoE, which is the strongest genetic risk factor for late onset Alzheimer's disease and ApoE risk score predicts the timing of mean cognitive differences shown on the left. And the rate of cognitive decline shown on the right. The immediate memory composite score and three other cognitive composite scores were tested but are not shown or shown that showed the same pattern but are not shown here. The mean differences between ApoE groups began to diverge in the early 70s. Well, the differences or the rate of cognitive decline diverges in the late 60s. And similar analyses with polygenic risk scores and cognition, as well as biomarkers of Alzheimer's pathology from the cerebral spinal fluid. Those results show similar trends. This shows longitudinal trajectories of an outcome by genomic data, which those are static over time. And we have begun studying longitudinal metabolomics, but we only have just two time points of metabolomics data on average. It's difficult to distinguish change over time versus random variation. Our next step is to measure metabolomics and eventually other omics on additional time points. This is necessary to move from cross sectional correlations to patterns of change over time to identify cause and effect relationships. The second area of the second opportunity is using heritability estimates if a large enough sample size exists to determine whether the ohm is influenced by genetics versus behavior or environment. We can do this by combining genomic data with other omic data. In this pinwheel plot of metabolomic heritabilities on the left, each bar indicates the heritability of the corresponding metabolite. And the tall bars that reach out towards the exterior rings of the circle have a high heritability and are influenced by genetics. The shorter bars are influenced more by lifestyle factors. The different colors in this figure show different classes of metabolites, like amino acids carbohydrates or lipids. And we can use this information to inform further analyses. And that brings me to the final two opportunities. For omics with moderate to high heritability, we can use genomic data and Mendelian randomization to establish causality. So that is that that omic data are predictors for the outcome versus influenced by the outcome. And for omics with lower heritability, they may be mediators of the relationship between behavioral or environmental factors and mediation analysis can be performed to better understand the mechanism and examples of this were mentioned yesterday by Ji Hong Lin. One minute. Thank you. Work for both of these approaches is a major focus of my group currently. And I wanted to acknowledge the funding for this work by the NIA, and then also current members of my lab and without all, I'm open for questions. Thank you Dr Engelman for that very stimulating. I would like to open the floor up to questions. I don't see any raised hands at the moment. Oh, yes, there's one. Allison. Okay. Yeah, very nice. I was wondering if you had looked at CSF versus the plasma biomarkers or predictions to see whether it was significantly different in the ability to make a predictions with the two because obviously they're one of the challenges for neurologic and psychiatric diseases. If you have to collect CSF that it's harder to get people to do it particularly, I would say, in minority populations. And so, clearly being able to find reliable markers in plasma would be significantly easier for experiments. And then my second question is about timing and when do we need to start to collect longitudinal data for age related diseases. There's a lot of evidence right there in Alzheimer's disease. By the time you have symptoms, the disease has probably been going on for 20 years. And I mean, do you think it's in the rap study and others where you're doing adult children. Do you, do you think you've even started too late. Do you, you know, when's the optimal time to be collected starting to collect this data to be able to get the baseline rather than be going in when you've already got changes occurring. Right, so let me take that last one first. So I think so in rap we started collecting data they were initially age 40 to 65. And we definitely I think have started based on so we have imaging markers of both amyloid and tau. And we also have cerebral spinal cerebral spinal fluid markers of amyloid and tau. And it definitely thinks that for most of those individuals we've started early enough, we, we, you know, we are starting to see those changes. One of the problems we're having in our cohort is that we, in general, it's too early to see changes in cognitive functions so those are just kind of after 20 years, starting to we're starting to pick up, you know, associations even with ApoE. So I think you would probably need to start it like, for example for Alzheimer's if you started in the 40s and in early 50s that's probably early enough. If you get much past the mid 50s, you might be, it might be getting too late for some of the earliest pathologic changes in Alzheimer's. And then for your first question about how well the cerebral spinal fluid biomarkers correlate with plasma. So just like for the existing biomarkers that we have of tau and amyloid and and and some of the other markers newer markers. It really depends on the marker itself and so for the metabolomics for that cluster that I showed in the lower right of that network analysis. We found that for that cluster, the prediction in plasma was not, not nearly as good. It was there was a slight prediction I mean it's a slight, it was, it was okay and predicting the biomarkers but it certainly was not as good as cerebral spinal fluid metabolites. And so I think it's really going to depend on I think there may be some there are definitely some across the blood brain barrier and that may be or or where the peripheral levels still correlate well with the levels in the cerebral spinal fluid so I think it's going to depend on the metabolite itself or on the biomarker. Thanks. There's one, one more hand raised that's Mike Snyder. Go ahead. Yeah, maybe a related question is you have two measurements which I know isn't very many and you may not be able to answer what I'm going to say but you have some sense of the variation from those two measurements and so can you estimate how many measurements you need to build, you know, personal trajectories was, you know, certain kinds of confidence. Yeah, I don't think it's clear yet, I think we would need at least, you know, really three or four measures on individuals, even to be able to start to answer that question. Yeah, I just I think with two measures, even just what we've seen with our cognitive testing with two measures you just can't. You can't go to good sense of whether it's random variation versus they're starting to be a trend. So I think we're going to need at least three, if not four measures to really establish that and for some of the metabolites or proteins, you know, three, three may be enough and maybe for others, that's not going to be enough so but yeah I just don't think with our data that we know that yet. I assume these are fasting measurements, is that correct. They are mostly I mean every now and then we have a participant that didn't even know they should have but so far, yeah they are and we always perform a sensitivity analysis removing those individuals so yes they are. Okay, I don't see any more hands. So, thank you very much for your presentation, we will now move on to our fourth speaker of the session. Last but not least, Dr. Tess Mercia, who is the associate professor in the Department of Pediatrics at Cincinnati Children's Hospital. His research interests include genomics, genetic ancestry racial disparities personalized medicine and bioinformatics. Dr. Mercia's work focuses on integrating multiomics with statistical genetics and bioinformatics methods to uncover the molecular architecture of medical conditions, such as asthma and asthma related allergic diseases. And Tess will talk to us today about using a multiomics approach to better stop type heterogeneous patients and predict patterns. Thank you, Dr. Mercia. Thank you, gentlemen, and thank you everyone for attending Friday afternoon. So I will be brief, three speakers, I just think very nicely what I share their view, so I'll be very brief. So our staff's summary of my slide, which we put together and submitted for publication recently. So the hope of multiomics integration is we gain from this integration a synergic effect and be able to accomplish some of the tasks I listed here and I will go through some of them. And in terms of endotyping, especially in allergy related diseases and predictions. But as you can see the integration problem process we discussed yesterday has a challenge and opportunities that we need to come across and have some kind of standard approach. So why we need on the multiomics it's already discussed yesterday because if we focus on one of the target we used to be on genomics always miss the rest of the puzzle so that's why we need this multiomics integration. So in my talk I will try to address some of the approach to reduce patient heterogeneity and risk prediction from the role of multiomics in endotyping and you know from clinical phenotyping how we can computationally use computational tools for endotyping and of course multiomics and risk prediction and patient classification. So finally I'll leave you some with some challenges and opportunities that we have. So key milestones in the field of allergy, which could be also applied to other other areas is that you know we start on a learn on from the technology side and clinical side, a number of steps we accomplish when I was a graduate student we used to type few snips at a time. Now we have millions of at our hand in a very quick time so we reached to a multiomics arena where we think of more endotyping which is more close to the mechanism and stratify patients using some of the algorithm that we discuss already in the machine learning arena. So coming back to asthma as a previous speaker discuss about the issue of asthma and pollution asthma is most of you know it's highly heterogeneous and it's an umbrella term so we may two patients measure a phenotype but do they show really a mechanism that's a challenge we face in the field. So, for example, we have a group of TTH1, TTH2 type. As you know, as you probably heard in the literature, even in COVID, some asthmatics are very severely affected some are not that's related to the type of TTH1, TTH2 relationship that is going back to their immunity. So there is heterogeneity we need to address through the endotyping mechanisms. So from our publisher figures back, we look at the data from ComCare using from the DBGAP and we actually cluster using random forced output pretty much clearly into three different clusters we call them endotypes. So sub phenotype or endotype and then we since we have a genomic data we are overlaid to our ancestry groups and we actually come up with three ancestral group which are clearly different terms of significance in terms of their global ancestry. So I think this is somehow showing us the translation of some of our approach that we can cluster phenotypes and then look back the clusters in terms of genetic architecture. And we run actually juas in each clusters and we found new signals that weren't uncovered when we combined all those sub phenotypes in asthma case control status. So we again use multiomics in this case transchromic analysis to address the asthma puzzle, and we collect sample from different parts organs, including blood and nasal and all other different parts. I know we discussed the relevance of blood in asthma. We did we look at those relationships and the predictive accuracy of all these different sources of samples. So, as usual, we followed the standard approach of data analysis going through each of this but this is a traditional conventional way of doing gene expression. And we look at the hierarchical clustering as you can see, we have really very high resolution when we run clustering based on the cells, the origins of tissue types. And as you can see, nasal and macrophage in us, it's not a cell level but as tissue level we see really high resolution at this point than a combined cell tissue type where you see it nothing is resolved at all. So we also look at the PC analysis we see a clear separation between subjects based on their expression value and asthma status. And we look at the overlap, we further look at the overlap between the gene expression level and transfer and then further looking the pathway level. As you can see this fine diagram, we look we actually got more overlap at the pathway level than at the gene level, which is probably expected. And we did some, you know, can we really use easily accessible tissues or samples for less accessible tissue, we're able to predict some of the surrogate of this the transferability of some of the marker to less accessible tissue. So the work is published probably I'm not going through the details on this time, but this was interesting work with it recently. And then we look at further using deep learning approach. In this case, we have a tool from my collaborator passionate. And as you can see, the UC curve, the deep learning approach has a tighter error and higher predictive accuracy than the SVM or the neural network. So we showed that some of the accuracy gets even much better as you use more advanced methods in this case, and also the prediction accuracy. So as Nathan talked about the PRS, we look at different types of PRS in terms of this is from SNPs, this is from methylation array, methylation risk score and genetic expression risk score. And we pretty much got a good predictive process right now we're in the time of combining all these different omics type and probably get a more accurate and applicable prediction accuracy. So in summary, I think we have a high throughput meltyomics technology, but the challenge I face and I always try to address is, we don't have really high throughput phenotyping and phenotyping is really the limiting factor in many ways. We need multilayer models. The models we have now is more of a standard approach. As a previous speaker knows that long-term data is important, special taxon data, the exposure is important, we discussed already, and the standardization of harmonization of clinical information. This is really important because we often talk big data from multi-center, from multisensory, all these things, but eventually how we really harmonize. So unless we have a clear phenotype, really doesn't matter how big data we have. So I really want to focus more on phenotyping because that's what we're going to end up correlating. So the multi-term speed diagnosis requires a correct correlation of omics with detailed phenotyping. That's where the end goal is. So in this case, we have a pipeline, which is, as I mentioned in the process of Reisner under review, to use deep learning result clinical phenotyping and prediction in asthma. So I will end up with this approach. I know we talk a lot about multi-omics, but in my view, multi-omics is one part of the equation, but we need to look actually the inbuilt, all the lived experts, the old accessories, the clinical, other factors, environmental factors, expose them. To integrate all these non-genetic, non-omics factors with omics, definitely we end up stratifying patient worry, end up going to the precision medicine area. So I will end up with my acknowledgement with my group here in Cincinnati Children and senior mentors here and collaborators and the funding, including internal source and NIH. Thank you very much. Thank you, Dr. Marcia, for that great presentation. And we do time for some questions. If anybody looking at Reis Tans, I see one, Dr. Kelly, if you would like to ask your question. Hi, thank you. That was really great. I really enjoyed your presentation. We're also very interested in endotyping, and I was wondering for your endotypes, if you've done any work, you know, thinking about the longitudinal data, looking at endotypes with samples collected at different time points, do you tend to find the individuals cluster in the same way or in the same endotypes if samples are collected at different time points? Yeah, that's interesting. Good question. We have from the DBGAP, as you know, account care, they have a longitudinal nature. We have the data, but we haven't looked at the longitudinal. Sometimes, as you mentioned, clustering, some of these conventional clustering, it's very hard to know their consistency or can you replicate this machine learning you can cluster, but if you change something, it may change in the next round. So looking at more biomarker oriented classification, like we did ancestry to classify, probably that will make that probably help, but we haven't tried yet the longitudinal nature. That's great. Thank you. Thank you. Judy, I think you're next, please. Cass, great talk. Thank you. I've always been impressed by the heterogeneity of asthma. Can you, I haven't updated on the GWAS of asthma, like how substantial are the population differences in the asthma signals? Yeah. Thank you, Judy. There is no, I mean, there is some study in terms of diverse population in asthma. To my knowledge, the majority of the GWAS catalog is based on European ancestry, but there is some coming from UCSF and other Latinos and others. There is a few signals that are African specific, some Latino specific, but, you know, most of this platform are based on European ancestry. So I don't count those are really true differences because the source material we use the genotype platform, all are created by the European as a reference. So it's hard to say there is, yeah. So it has the same problem that many of us, including IBD has. Yeah, I believe. Howard, I see your hand up. Cass, thank you for your talk. I enjoyed that very much. You mentioned the importance of the phenotype data. And I just wonder if you can tell us whether when you publish your work, whether there's a detail deposition of the phenotype data as well, what is the standard in the field, because it seems to have other people to make use of this data, or to make use of new, new analytic approaches. There are some standards for genomic and epigenomic data, but what about the phenotypic data? Yeah. I think it's coming now more and more with Dr. Nadu mentioned a site of and other, other approach where we can get more biomarkers and detailed phenotypes. But right now, it's very high level. You know, some of the courts are based on a phone interview or some other, you know, high level with a question or based. So I think. But that's actually what I mean, the clinical phenotype data, right? So it's better than a column that says as my yes or no. You have way more detailed information, right, on these individuals that that's actually what you need. Yes. Finally, yes or no, it doesn't help because we have this extremely heterogeneous phenotype, as I mentioned that even the TH1 type is a good example. If you look at the COVID, there is a mixed relationship between COVID and asthma. Because if you are attached to, you have less risk to COVID than TH1. Those relationships can be absorbed with more phenotypes that we have. So I really deeply think phenotyping should be given much in process because with the technology, we can have really high throughput, all these measurements. But how we get the phenotype is a big challenge to me. Thank you very much. And I think that concludes our, this portion of the agenda. I'd like to thank all the speakers for those terrific presentations and also all the participants and the questions. And if you think of other questions, feel free to put them on the chat and make sure that you indicate which speaker you want to answer your question. And with that, I'll pass this on to the next moderator who is Jonathan Haynes. Thank you.