 Hey, well, thank you, Erin, and thank you everybody for all the great insights and discussion. And so actually, Judy and my role here, we really want to facilitate further discussion and kind of come together with your input for some specific ideas and recommendations. And thank you to Erin for actually, and to our staff for actually taking notes throughout the meeting. So they've sent that to us and I've summarized a few key points. So if I may, I'm going to try to share my screen for some of the recurring themes I noticed in the notes. So some of the themes, I'm just going to, you know, sort of elaborate a little bit. And so first is that we've heard a lot about sort of the value of the multiomic data. There's a lot in their value in several ways. There's the ability to really interpret for example, potentially non-coding variants that might affect, if that occurs for example, there are the vast majority of GWAS hits. So they're a cause of disease and we might understand them better if they're non-coding genes. There's also the capacity to capture information that has to do with the environment, exposures and so going beyond the genome. So there's a lot of value there. A couple of the, several of the comments relate to the fact that a lot of the currents of multiomic studies, if they're from blood or they're from some sort of bulk tissue, they do not get to the potentially the pathogenic cell. We heard from Sarah a beautiful presentation about how single cell information can really give us an opportunity to get at these rare pathogenic cells. So potentially one area of future continued technological development that NHGi may support is technologies for single cell multiomics, capturing all these kinds of information from the same cell, potentially even with spatial context. A second sort of theme that came up was really the value for longitudinal data beyond case control studies. Mike talked about, Snyder talked about longitudinal imaging, longitudinal information gets information by trajectories and you'll need multiple measurements from the same individual. And those kind of studies are inherently powerful because the other time points serve as controls for the disease process for that person. A lot of studies are mostly done in blood. So the question is, can we go beyond sampling from blood? And so then you have to think about what other samples are frequently collected, acquired from our patient populations. Some of those are going to exist in FFP. That's going to be the vast majority of the state that these samples are stored in. So then perhaps if you want to say just pushing the technological front, we would have to make our multiomics or single cell multiomic methods work with FFP or some others in the common maybe solid tissue samples, which has some different kind of technological challenges compared to blood. A third theme relates to computational methods to harmonize and integrate all this very rich data. There's a lot of actually work that needs to be done there for standardization and gaining insight. And finally, we heard a lot about phenome data. This is really the most perhaps very most interesting and challenging aspect. There's a lot of information in EHR. How can we explore these records from electronic medical records without either missing information or compromising privacy? Is there a need for some sort of further level annotation? How's that going to be paid for? And also sharing this data. And I personally imagine that NHGRI's position and perhaps policy on this could really be important in setting the trend. So these are some of the sort of themes that has come up. And so let me let's open this up now for everybody to kind of chime in. You agree or disagree with these points or you want to add more. So please raise your hand or post questions in the chat. Sarah. Hi, Howard. Thanks. So so I think all of those make sense. I mean, do they are they are they are sort of infrastructure or technological and infrastructural things that I think we can all agree on? You know, or you know, fairly uncontroversial, like is it is it, you know, your clinician? So would you say that there should be a focus on particular diseases? Or are you looking at this more like a population based. Kind of, you know, phenome data, EHR, annotation, sex sharing and so on. I mean, how how how is how should we think about this? How should we prioritize populations to, you know, cohorts, cohorts? Right. I mean, I think so, yeah, so that that's a very. So you asked a really broad question. I think that is most likely that that the multi-element technologists we've heard is really broadly applicable to many kinds of diseases, right, broadly to the phenomenon of aging. We heard a lot of studies by Alzheimer's, heart disease, even neuropsychiatric disease. So I don't think we should pitch in a hole. This technology is being applicable to certain disease. My own work is an area of cancer and there's a lot of great applications. I completely agree with that. But I think there are, you know, not being a clinician, I think there's something like twenty two thousand, like registered disease terms or something like that. And it's not financially feasible to kind of address every single one. So how to I mean, aging is, I think, something that everyone can agree on is important, you know, as we age, but beyond beyond that kind of is there is there where prioritizing like areas of unmet need or or where where can let's say single cell multiomics, you know, or, you know, FFP kind of spatial technologies have the biggest impact in your opinion or anybody else from a clinical point of view. Right. I think that with any new technology, you have to meet kind of medicine where it's at, right? So basically, then if you're thinking that, so these multi omic methods we're talking about, they require some access to patients' cells, right? Like maybe something in the future, patients just have to breathe on something or spit on something and you can get multiomic data at the moment, we still need access to to the cell. So then the kind of clinical practice where they're constantly taking samples of the tissues of interest, that's the place where this technology can either we start to get adopted. That's the practical answer. No, that's a practical answer. No, no. Okay, I see a question from, is this Eric? I cannot see the name. Sorry about that. I see a hand that's raised. Please go ahead. Hi, this is Phil Diego. Okay, sorry. No, no problems. I mean, I think the, again, I think what we need to do more, I think, is understanding the healthy variation among healthy individuals, and particular biological rhythms, which affect all these measurements, whether it's seasonal, diurnal, menstrual and other cycles, which are, for many dimensions, are not well understood, especially at a single cell level, and we're beginning to explore this through the HCA and other efforts. But again, I think we probably need to do this at a larger scale and to really get the full diversity of the human population in terms of these rhythms and also effects of the environment. And again, I think some of the existing cohorts have done some of this, but I think we need maybe to do some more structured sampling to really sort of get a good baseline on variation. Great, thank you. I have a comment or question about folks who work with EHR data, because I can imagine that the clinical annotation for a lot of these kinds of, basically, maybe different research groups have different kinds of scales and different descriptions. So it's very hard then to basically compare across multiple studies. And so how do you deal with that in EHR data? Can you, for example, export the ICD-9 codes that are associated with individuals from your study? ICD-9 codes are insurance billing codes that are obviously used, sort of, of course, in the US, that would all be in common. And so is that feasible? Is that allowed? I would love to hear from people who have more experience with this. Craig, you have a comment. Let me stop since I can see everybody. So the big comment I have is a lot of our effort right now as part of this Orion network is spent with clinical abstraction specialists who have to go into the EHR and structure it. It's the biggest effort that we have right now. So I would love a magical solution, but right now it's just paying somebody to do it manually. And we have 600 patients we're going through right now. It's like, you know, maybe 20, 30 patients a week at best. What exactly is your annotation team doing? Like, explain to us a little bit, like, what is actually happening? So they're looking at what drugs are given, but they also have to understand the context a little bit. So this is a cancer tumor registry type person who's going in and seeing were they given a therapy? Did they go on an immune blocker? And so some of the data structured, but some of the data isn't. And so they may be former nurses or so forth who can go in and put it from the EHR into red cap based on a framework that was defined by our consortium. Orion is one of these networks and it's funded by a private group. But basically we have 600 fields that we fill out. And it's all of the things that a consortium did. And so they just do it manually. It can be a lot of things. A tool and then Nancy. Yeah, so as you know, ICD-9 and ICD-10 codes are pretty arbitrary. The codes themselves are, they seem so perfect, but it all depends on the billing folks and what, you know, what they bill under that. So the docs themselves, right, we obviously have docs go between hospitals all the time. We never retrain them, right, to know a disease is called this here and that there. So they usually mean the same thing. But it's important to note that they're used for billing. Obviously a lot of the meat of a phenotype, let's say the number of joints affected by this or the number of that affected by that are in the text notes. And indeed, you can use concept identification to try to extract stuff, which is crude and has a long way to go, a lot more fundable science possible there. But otherwise you hire curators. And that's what some of these companies doing, right? Flatiron has 500 curators. Tempest has 500 plus curators. That's how they really get the phenotypes. I just put it in the chat. There are this thing called FeeKB, which I think comes out of Vanderbilt, event many others, which are just heuristics or algorithms. You know, if you see rheumatoid arthritis coded three times and you see of this, then call it rheumatoid arthritis. So it's a crude way that's been empirically validated, both from EHR data to phenotypes, let's say from the genetics point of view. But it's just a start and I don't even think they get funding. They should probably I'll just shut up now. So if I may follow up on that point, right? So OK, so once you have this basically these 500 curators going through and reading all these charts, and then let's say this group of report publishes the paper, is there also a data matrix of all the phenotype table, right? Or is that somehow like then disappears that nobody can get access to that? I mean, in my experience, when the companies do it, of course they don't want to release that. I mean, that is their killer secret sauce advantage, right? Right. I mean, to put a lot of resource into that. Yeah, well, and then some point like we have 100 curators, I think, if you probably kind of them all up across Orion at the different sites, and it's funded by pharma. And then we have the academic rights to publish. But at the end of the day, there are so many restrictions, it would be good to see more publications from it. It's a different source of money. So it'd be good to have the public part forced that data out some way. That's where those public private partnerships could maybe help. OK, Nancy, you've been very patient. No, I just say there are there's a whole a whole literature on this, you know, you go if you, you know, look at electronic health records, literature, the very active science in this area. And the algorithms, yes, it can be validated with, you know, sort of physician standard gold reference. But just a reminder, physician standard Platinum reference, whatever you want to call, it's not truth. It's all data. We have to we have to learn how to get to truth with the data that we have. And the it's often the case that the physician gold standard diagnosis wants data that don't exist in EHRs. So I mean, that's that's just what we have to live with. EHR data is complicated relative to research quality data. But it's the data that we have to do all the translation in. So so we just need to swallow the bitter pill and use it because this is all we will ever have for doing translation. And so we need to learn to work with it, learn to find our signals in this quality data, interpret everything in this quality data and and take it forward because this is all we will ever have for translation. So we we do need, in some sense, to get to really translate our research findings to this space, because this is where we have to translate it. And yeah, it's kind of messy. Some phenotypes, I mean, Crohn's disease is actually really good in EHRs. If you've got four or five diagnoses, billing codes in some period of time, it's an outstandingly high quality diagnosis. But you look at something like type one or two diabetes and it sucks that you're asking physicians to make a distinction between types of diabetes that they don't have any of the research quality data that we use all the time for making those distinctions. So it's a really disease by disease variation. A lot of algorithms can work that involve, you know, checking for drugs, checking for certain kinds of procedures that are common within a particular diagnosis and and text mining for negation, you know, rule out kinds of diagnosis. There's there's all kinds of tricks to this. There's also now perfectly really outstanding quantitative probability cloud level diagnoses made with the whole set of diagnoses. So you get a probability distribution over many possible diagnoses instead of instead of just a single diagnosis. And that's real world too. So I I think it's not a question of of either or we have to do this. We have to learn how to translate our research findings to this quality of information because this is where this is the only place we're ever going to translate anything. Thank you Nancy. So I had a comment in the chat. So I'm going to put him on the spot for a moment. So it's well if you can talk about then. So what are the gaps? What are the needs? Right? Because you've been trying to do this, for example, within the UC University of California health care system. So if you have a patient that you've done sequencing you've done some molecular studies and that patient is going back and forth between UCSF and UC Davis. Is that is that OK? Like do all these hospital records talk to each other or they have to keep coming back to the same side to the mothership? Otherwise the information is somehow corrupted. Yeah, so we we have a centralized database of all patient care data across the entire University of California. So all six academic medical centers that's synced up every two weeks. So we take all that data. We started with federated. A lot of discussion that shot about federated, which is great. I to beat is a software. A lot of people use have been companies launched on that as well. But federated only got us so far. We were using it to beat you for more than 10 years. Yes, you can count patients, but right away you want the actual date on those patients. Then you got to go chase down someone to get get it for you. And so federated just stalls right away. So that's why we moved to central when I left Stanford, right? Five years ago. So but Howard, if you ask me if someone goes between Sutter, our competitor and UCSF, then of course, we don't have the Sutter records in there, right? And they don't have ours. Of course, the clinicians can see that for patient care purposes, they happen to both be on Epic, but that isn't allowed for research purposes, right? So there's all sorts of subtle tease here. But you know, a huge plus 100 for what Nancy is saying, right? A lot depends on, you know, people say all the time, electronic health record data is messy data. I completely dispute it. Medical care is a messy world. The HR captures that mess really well, right? It's not because it's a HR spot. Medicine is messy, right? Who's actually rendering the phenotype? Is this a rheumatoid arthritis specialist who's seen that, you know, these kinds of cases for 20 years? Or is this a primary care doc who just graduated from residency? This is their first case, right? Yes, there's a nice Christine ICD-10 code behind both of them, right? What does it actually mean? So we're going to have to deal with that world. That world isn't going to get any prettier, but algorithms, etc. etc. are going to have to work within that world. I don't know if that answers your question now. Yeah, we've had some good discussions on electronic health record. Let me just ask one last question on the EHR. I mean, it fundamentally has to do with touches to the health system that Neil Hanchard is asking about, which is, you know, once you get past 50, you're going to have much more intensive health care. But we think that a lot of health is determined by what happens during pediatrics and childhood. Does anyone want to comment? Because, you know, we want to get beyond the EHR discussion, which is obviously an important one. Anyone want to comment on, are we calculating? How are we seeing touches for the first 20, 25 years of life for Neil? I don't know if Neil's going to answer. I'm technically a pediatrician as well. Oh, how I think, oh, how I wish, I think our field wishes we were that relevant. Because I think, you know, the average pediatrician probably sees a kid maybe every 18 months, right? A couple, maybe in their teen years, maybe once every other year. And then that's it. So maybe a total of six encounters. Yeah, I would probably say that, you know, bad health in childhood probably leads to worse health as an adult. But, you know, that kind of life course kind of science is really tough to get funded. And we know what happened to National Children's Study and all the rest, right? So I don't even know what else to say. So I was kind of asking it in a, I was actually asking it in a EHR, which is that, you know, are the tools and so on that are built for adults? You know, will they, will they work if you wanted to do something in the pediatric EHR? So, you know, like, we work on early onset hypertension, but it has a very different categorization than in adults, right? So, you know, are those tools available? I doubt they're really generally available at all. So I think CKB is my that's one that I put in the chat. I think it tends to be a little adult-oriented. I think the pediatric side of it is much harder, probably much more captured in the notes, right? You're looking at an echo and you're sort of looking for one field of an echo, perhaps, you know, partial pressure over something. So it's going to need a lot more text notes and parsing. So I bet that that science and engineering still has to be developed. Alright, Mike Sander has a comment. He has his hands up. Yeah, I mean, just picking up on the pediatric stuff I think it's really cool and it really needs to be done. But you're right, our system's not set up to do it because but it's so important because as you know, nutrition in the first few years of life basically imprints you for life. And we don't know how many other things are like that, quite frankly, probably behavioral stuff. The whole neurological side, I think it's super cool during, right, all the childhood years. So we really should be capturing that information. You need a long term Teddy kind of view on this whole thing where you start, you know, preconception and go out till people die. And I know that was proposed and got trash, but at some flavor of it should get resurrected again. If nothing else save the samples. I think we learned a ton. Don't forget to bring in mental health around it. Nothing more formative than those teenage years for mental health. Mike, go ahead. I'm sorry. Go ahead, Howard. Oh, just another question just to switch from the EHR. We've had a lot of good discussions of the EHR, but what do people think about the multi-omic? This is the theme of the workshop. Is there a need for additional methodology? What are the, if people were to say, okay, we need to do, we have all these great data sets, we want to analyze them. What do people think in terms of the analytic methodology piece? Needed? Priority? So I would say that working in this space, that was definitely the answer is yes, because I think that there are still many, we talked about, there's a couple of flavor, there's the multi-omic aspect of how many different modalities can capture at the same time. The common ones now being, for example, DNA sequence or chromatin and RNA, non-increasing also protein. But I think combining that with top of the low-mic measurements, which is not yet sort of standardly done at the same time or from the same sample, that's one aspect. The second aspect is, you know, can you do the same single cell? Okay, if you're talking about single cell level, or so then going deep there. The third level then is how many single cells can you capture? Because you're talking about rare pathogenic cells. If that cell is one in 100, then how many cells do you have the sequence to see that? Let's say it's the brains, the microglia. You have the sequence through many, many cells. So what is your throughput? And what's your cost? And remember we have to do that across multiple samples and multiple individuals. And so there's still a lot of technological challenges, right? So all these aspects people are working on. And one does not negate the other, right? You want to kind of get better in all dimensions. And Jonathan, you have a comment. Jonathan, you might be muted. Sorry. I was just going to follow up on what Howard was saying. And it absolutely critical that we have the methodologies and that we continue to work on the methodologies. I would put it in slightly different, maybe put a slightly different view on it. There are really two things that we need to have those technologies that the methodologies worked on. One is integrating the different omics on the same samples, right? Whether that's single cell, whether that's tissue, whether, you know, whatever it is, you need to, we need to be able to do that. There people are working on this, but there's just a tremendous amount of progress that needs to be made there. Then there's the other aspect of trying to integrate the multi-omics across different samples, right? So some of the omics is done on one type of sample. Some of the omics, other omics is done on other samples. You can learn information from one, you know, from each of those. And you need to figure out, we need to figure out how to best integrate all of that. And I think that's, you know, this is where you get into the computational aspects of things, the deep learning, the machine learning, things like that. But that's absolutely critical, and there's a lot that needs to be done there. Good way to think about it. Nathan? Yeah, I agree with all the things that have already been said, but I think there's some really interesting challenges that come when you're doing data on dense multi-omics. And one of those is that sometimes when you're designing studies, and I've been in many of these kind of debates, people start saying, oh, you shouldn't make all these measurements because you're going to have too many variables. And therefore, you know, you're going to have too many multiple hypotheses, and we're not going to learn anything. And my point of view is that that is a very backwards way to think about things. It's true if all you're going to do is feed data through a matrix and run an analysis on a matrix, right? Variables versus samples, you can't argue with that exactly. But it's a very odd place to be to say that the more we, the more we measure about this patient, the less we're able to understand about them. That's in my mind a very backward, if that's true, then we're doing something wrong. So what we really need to develop around multi-omics and this dense information is that you can dive in. And one, making more measurements does not only amplify your your error, right? Your your multiple hypothesis. It also error corrects. Because if you have one measure, right, and it jumps out and you if you make a small handful of measures and you see one jump out, you don't know if that's a measurement error. You don't know if it's an anomaly. You don't know if it matters. But if you have a bunch of measures, and you know what understand a process, you can see that if this measure jumps out into some really unusual space, well, are all the things connected to it? Do they get pulled on? Were they perturbed? Did they change? Like you can error correct a lot. So that's one element. A second is that we really need to be able to analyze on an N of one basis. What is happening in a particular patient? So one of the issues there, and I think Phil brought it up, we have to understand the wellness space in detail. And so that we can look and you can do this on an N of one basis, we do this a lot where you can then take a person's measurement compare it against the wellness state and then monitor. Here's everything that's really highly unusual. It's really weird for this protein to be 10 times higher than this other protein or it's really odd for these metabolites to be in this state or whatever it is. And now there's these huge efforts through Google's doing a huge effort on this, NCATS is doing a big effort on this, but building these massive knowledge graphs, where we can actually come in and say, all right, let's if we have a density of information, especially if it's longitudinal on a particular person, and we have enough population to know what wellness looks like, you can then dive in and look at a very personalized trajectory. We're just launching a big clinical trial and pregnancy to do this in that space. But basically you can dive in and look on these individual trajectories and and that's pretty nascent. And yet it's one of the huge opportunities for analytics in in the multiomic space, not to diminish any the other ones we've talked about, but that's a big one. And it's not focused on that much right now. And I think it's it's really important. Thank you, Nathan. I think let's spend the last half where we've been charged with recommendations to NHGRI. But before we go to that, one of the important stakeholders are obviously journals. So Tiago has agreed to make a few comments. Thank you. Hi. For those that don't know me, my name is Tiago. I'm an editor at Nature Genetics. And I would really like to thank our Judy Journal for the opportunity to participate, not just in the workshop, but also to give a short editorial perspective. So some of the points that I would like to mention have already, I guess, been alluded to. I just want to emphasize a few things. I think from an editorial perspective, and we're getting more and more submissions, more papers that use multiomics. I think from a science communication perspective, the feet, you know, what do we see in the feedback that we get often from reviewers, meaning you, from the community, is that it's these data sets are usually quite overwhelming and to actually be able to distill and extract meaningful biology from these data sets is still extremely challenging. I think that it derives partly from how papers are written, how the science is communicated. But I think it also has to do a lot with the really dire need, I think, for very for more refined analytical tools. I think we definitely need I think the community should, I think, in my perspective, invest a lot in developing new tools that can really integrate all these data sites. Also, I think benchmarking was something that was mentioned very briefly, and I cannot emphasize how important that is. But I think there should be really a lot of investment made in developing new tools that can not just and when I mean tools to integrate these data is not just from a sort of perspective of computational efficiency or statistical soundness, but actually to be able to extract biological meaning and mechanism from these data sets. And I know it's, you know, easy, very easily said. And, you know, it's harder to do it. But I think that's a point that I would like to emphasize that I would, you know, like to encourage people in the field to really invest in creating more tools. The other point I wanted to mention it's already been talked about is the diversity, basically, that we desperately need, I think, to include samples from different populations, from different ancestries. I think Tuli mentioned that, you know, in the Dua space, this is already being done, but I think in the functional or multiomic space, this is, in my, from my perspective, still lagging behind. So I would encourage everybody to, as much as possible, to try and include patients and samples from individuals from different backgrounds, different ancestry groups. And I understand that there are challenges there that are socioeconomic, you know, etc. But I think it's really, really important. And last, but perhaps not least, I would like to be a bit provocative and play devil's advocate here, not to sound, you know, like a curmudgeon, but basically to, hopefully, trigger some discussion in some thought, which is I am routinely seeing more and more studies that I feel and the reviewers feel as well that are performing multiomics because they can and not because they should. And the point being that and I think this is a natural risk or caveat with any emerging technology, which is, you know, you feel that it's trendy, you feel that this is where the field is going, the funders sort of require or expect this, etc., etc. And you go ahead and do this. And then in the end, you've invested a lot of time and money and you don't necessarily understand the disease better or come out, you know, get anything meaningful out of it. And again, I'm not saying, you know, because I do, I am a believer in the part of multiomics. I'm just throwing this, you know, back to the community to say that I think it is extremely, extremely important to think about what is it that you're trying to address? What is the question that you're trying to answer? I already mentioned, you know, earlier that there's I think there are even though they're not mutually exclusive, there are very significant differences in how you should design your study. If you're basically just trying to identify biomarkers of disease progression versus if you're really trying to understand disease etiology or the mechanisms that underlie, you know, pathogenesis. And I think it isn't really important to think about this, consider this. Because I often see papers looking at, you know, sometimes I see like huge efforts of people doing bulk tissue, multi-omic approaches and the tissue is not relevant for the disease. At the end, you cannot really explain other correlations or anti-correlations that you see and you're none the wiser. And it's in some cases it would be more informative just to do one or twoomics technologies, for example, but at the single cell level and using relative of relevant tissues or cells. And I think it would be way more important. So this is just a challenge that I would lay out, which is, you know, think very carefully what technologies you need to use to address the questions that you want to answer. Think very carefully whether you need single cell resolution or bulk is enough. But don't just do multi-omics because it's something trendy or it's something that you think that you ought to be doing because of, you know, where the field is heading. So, yeah, so having said that, I think those are the only three points that I'd like to emphasize. So basically more need for analytical tools, more diversity and careful study design and technology application. Thank you. Thank you. That's that's terrific. So for the remaining time, you know, our charge really is to provide recommendations to NHGRI. They're all on the line or many of them on the line. So again, we just open it up to the floor. Judy, we thought maybe we could put up Joe and Joe and Ellen have been capturing some of what we heard. Maybe put that on the screen, let people look at it and then weigh in and see if there's anything that we missed. Does that sound good? Sounds great. OK. Can you all see Joe and Ellen's screen? Is it big enough? OK. So the recommendations we have captured at the moment are listed here. Should we go through them one by one or can people see and offer? Maybe you could just add session five so it's on the next page so we can just be seeing one fell swoop there. Makes sense. There we go. Perfect. There we go. I don't know if the first point is is related to what the point that I raised about integrating data for multi-omics cohort studies with multi-omic sort of experimental perturbation studies. I don't know if clarifying that might be OK, make it clearer. Sorry, I'm OK. And Terry, you have a question or comment? Sure. Thank you. Oh, I need to turn on my camera although you don't want to see me. But but at any rate, I'm the studying variation among healthy individuals. Kind of a challenge while we need to understand variation that's within health, sometimes there's not a lot of it that relates to disease. And so so maybe we could get some suggestions for how to balance, you know, comparing health and disease or what kind of diseased individuals to look at. We talked earlier about people with exacerbating remaining illnesses. Judy showed some really nice data on that. So so what do people think about how do we balance that to get the most bang for the buck? Well, my reflex, Terry, is that, you know, we've had a couple of multiple people saying that transition from health to disease. Sure, how do we capture that? Yeah, so having spent 20 years in cardiovascular epidemiology, it takes a long time to go from health to disease. And you have to study thousands and tens of thousands of people, you know, unless you catch them at right just the right moment and you don't know when that is. So so if there are our suggestions for how, you know, who might be in a transition, that would be helpful. Maybe to respond to the this is felt since I made the original point. But I think actually we've seen in the in the brain, for example, although this is autopsy material that the about five to 10 percent of the transcriptome varies with seasonally, you know, in the brain. And so I think it's actually we don't need large numbers to actually measure a lot of this biological variability, these biological rhythms, and then they will be able to enable us to, you know, adjust the measurements that we do to account for this variability that otherwise can get confounded because of our sampling scheme. So I share with you know, I would definitely sort of think that the few hundred subjects carefully, you know, characterize, I think we could go a long way, particularly, you know, with more diverse populations. Mike, you have a comment. Yeah, well, I guess I had the last one, Terry's point, certainly to go after at risk group certainly increases your probability. But I think we need it just on basic healthy individuals. And thinking about kids again, was incredible transitions going through again, preconception all the way through probably the 20. There's just a lot going on. It's very hard if any of you have ever done kid studies to compare a disease kid with a healthy kid. There's just not good reference data sets out for healthy kids. It's really a mass. So I think this, you do need the healthy cohorts period. And then at some frequency, you will get, you know, people do get disease. I originally raised my hand because I also, it's sort of implied here, but I think these longitudinal studies, I think you can't, it's there in the life course situation, but it goes beyond life course. Longitudinal studies are extremely powerful. Right. Neil, you have your hand up. Yeah, I was actually originally gonna comment about this idea about like, how do you get the transition from health and disease? And someone said earlier that, you know, taking those were at high risk and, you know, arguably there are certainly Mendelian diseases where, you know, you know that there's a certain risk involved there. And maybe those would be good, you know, sort of starting points as to ones where you think that you might be able to see changes that occur. And so that's certainly one argument. I was also gonna second Mike's argument, just entirely selfish, but this idea about we don't know anything about like molecular puberty or like, you know, birth changes or any of these kinds of things which can be also really important. And I did also as a last point, want to emphasize what people had said about, is there a way to harmonize what's already there? There's a lot of that in the chat. And, you know, be able to utilize that in a more sort of reference way, rather than just kind of throwing out the baby with the bathwater. Caroline. Yeah, I think actually the two main things I was gonna talk about captured, which are sort of things in the chat and the meeting overall that don't seem to be in these recommendations, is this sort of harmonization as part of the integration need and then the diversity, although that's already been added as a sub bullet since I raised my hand, but those when I was just reading them as they were there were sort of striking in how they didn't match with what we've been hearing all day in terms of these recommendations. Yeah. Thank you. Alison. I just wanted to suggest in the healthy groups, healthy aging, right? Really the oldest old who are healthy overall are gonna be helpful in thinking about protective factors for against all diseases, but understanding, I think we all would like to age healthfully. So we all should have infested interest in that one. I agree, Alison. Is there like a centenarian study out there? I mean, there are, yes, but I don't know whether people are doing proteomics, doing your multi-omics in them, but those would be cohorts, right? Centenarians may be valuable information. Mike is being negative and saying it's just inevitable that we transition to disease. There is no such thing as healthy aging. Well, you get two birds with one stone. You see, you get aging, and then you also get transitions to disease. The iPop study is a great example of that. And I think Nathan's work as well. Yeah. I mean, we've had some discussions about feasibility and efforts that have failed in the past versus what we really need to do. What I'm hearing is kids are hard, but important. And so, you know, it really is, you know, it's a time of enormous transition and change. You wanna talk about transitions, so. A tool, you have your hand up. Yeah, just a suggestion you can ignore it if you want, but I would put as a specific recommendation to take more advantage of the Olivas Research Cohort. It's my perception that some of the tests that were taken off the table were solely due to funding. One could imagine more immunology measurements, more methylation measurements, more single cell measurements, if there were more funding there. And the samples are there, the electronic health record data is there. They're using Fitbits. They're connected with smartphones, and it's a 10-year study. So I would specifically put the Olivas Research Cohort as a recommendation. I agree. Nathan, or I forget who was first. Rachel, then Nathan, and then Tess. I was just gonna say, I think as well as, you know, looking, when we're looking at longitudinal data, I think it's important to also look at kind of the longitudinal course of a disease. So I'm thinking of, you know, chronic diseases like asthma, which is what we primarily work in, is that the important time is really the, I'm sorry, it's my cat, the exacerbation period, you know, when someone is actually an active disease as opposed to, you know, someone with something like asthma, who for the most part, at least what we found in aromid studies are pretty similar to a quote unquote healthy person, except for that period of exacerbation. So I think it's useful to try and capture those particular periods within a disease, rather just considering people as kind of either healthy or diseased. And I think as well to go back to Tess' presentation earlier, I think the concept of endotypes of diseases is really important. And something that I think moving forward would be really good to consider for most diseases because so few diseases are really, you know, a homogeneous disease state. There are mechanistically different subtypes within those diseases, which I think are important to study. Great, thank you. Nathan and then Tess. Yeah, I was just coming back to the earlier comments around, you know, there are, you know, certainly big studies for centenarians, the longevity consortium at NIA, which I used to be a part of, you know, it does a lot of those kinds of studies. And then I'll just follow up a little on what Mike said as well, which is, you know, you can certainly age healthier, but there is no, at least at the moment, no aging without disease. So big studies certainly that focus on healthy aging. Obviously I have a whole company called Wangevity now, so I'm very interested in that. But it's like, you know, as we get into that, you absolutely will see these disease transitions, and as a tool brought up, you know, the All of Us program is probably, you know, underutilized by a lot of us, because you absolutely need as much multi-ohmic longitudinal data as possible. And I was brought up this before, and I'll just share my point of view really quickly, which is that if you compare the challenge of trying to reverse engineer the human body and how much dense data we have, we have a minuscule amount compared to it. And these longitudinal multi-ohmic data sets, you can go back to analysis over and over and over and over again. This set that we developed, I just can't tell you how many times we have an interesting question, and we have an answer the next day because we have 5,000 people with longitudinal multi-ohmic data. And as we expand that, I think the number of questions that will be possible to answer, especially as it's opened up with for lots of researchers, it's really an incredible opportunity. So I'll just say that. Thanks, Tess. Thank you. Russia address some of my questions as well. It may be more of disease-specific, but seasonality in terms of asthma, you know, the pollen, where is the pollen season or the mold season, the indoor air pollution, all of those are really critical factors, including, you know, there is a steady inner city versus, you know, those kind of factors actually contribute a lot than just the ohmic. So I think the rest specific environmental factors that, you know, the built environment are critically relevant to in our, in this area. Thank you. Thanks, Tess. So let's get three last comments very briefly and then five minutes for NHG writer wrap up. So Ji Hong, Tulian, Ben. Okay, thank you. I would like to talk about the longitudinal data. So I think there's a lot of advantages using longitudinal design, but in the meantime, there are a lot of challenges in, so I spent my first 10 years of my career working on longitudinal data analysis, method development and application. So one thing is if you think about UK Balbank data, even the sample size is a half million but the number of cases of any given disease is small. And so for example, if you look at the lung cancer cases, only about 2,000 cases. And so therefore, when we design the multi-ohmic study, we have to be smart. And so if we do random sampling, then we, for any given diseases, we can end up with a very small number of people with the ohmic data. So therefore, I think for the first bullet point, I would like to say not only application of ohmic data, but also design of ohmic studies in the proper study. And the second thing I want to mention is that many of the current GWA study or whole genome sequencing study or ohmic study mainly focus on case control designs. So giving the longitudinal study that provide us new opportunity and to look at the genetic underpinning of the age at diagnosis. And so basically look at the survival data. And so it will be useful, we can leverage those information that is particularly helpful to identify early onset of diseases, genetic underpinning of early onset of diseases. And so the third part is it's a longitudinal analysis have many unique challenges, in particular like a dropout. And so like if one just ignore basing data and ignore the mechanism of dropping out, and then the analysis will be likely to be misleading. So one need to take into account the different missing data mechanism, for example, whether it's missing at random, not missing at random, and incorporate the statistical tool to properly address the missing data mechanism. So in order to make ohmic analysis valid. Thank you. Tuli? Yeah, I wanted to very quickly raise the point that I talked about yesterday, which is biospecimens. So like a solid at least one third of the things that are on this list are now kind of something that would be only applicable to blood, because that's the only biospecimen that can be currently collected at scale. And that is obviously like, there are important practical considerations here, but I think this is something where we should try to push the boundaries and think about what can one do with highly scalable non-invasive sampling of non-blood tissues? What can we do with biopsies in hospital settings? Can we have better, actually, highly scalable IPSC differentiation protocols? Like we can do better than blood if we actually put effort into that. And I think it's going to be tremendously important if we actually want to apply these insights into diverse diseases and understand more diverse biology. Yeah, I think connecting diversity with IPSCs is a no-brainer. Ben? Yeah, I think we probably need a proper validation strategies either in cynical or in visceral view to validate either mechanisms or biomarkers at a scale. So the outcomes will be used to help refine or improve the methodologies for integrating multi-omics. Because when integration, normally can give you like tens of thousands of hypotheses. So it's hard to test each of them. We have to have proper strategies to validate. So I don't know, Howard, you want to make any comments? Trying to like read through different locations simultaneously before we hand it over to NHGRI to close? Yeah, so thank you everybody for all this great input and ideas. I'd love to hear from NHGRI folks to hear whether they have something prepared, whether we have enough time to actually have sort of another round discussion. One of the, any thoughts? I think Howard, if you wanted to take two, two to three more minutes, we both have just one minute of closing thoughts. Okay, yeah, so I think that actually, it turns out that I was looking at some sort of prior meeting records. And it turns out in 2017, a number of the investigators funded by the Center of Excellence in Geomic Science, which is another NHGRI program, actually got together and we had a workshop and we actually proposed a set of recommendations for the field of epigenomics in precision health. Now epigenomics is obviously one slice of the multi-omics, so then that was published in NatureBot Technology in 2017. And so then I think that may be now three, four years later, it's actually interesting to look back and see what we proposed and what was potentially what had been done, what remains an ongoing need. So if I may, I'm gonna try to share my screen and try to pull that up. Okay, I hope you can all see that. And so this was the recommendation that was published and I'm just gonna focus on the figure. Okay, so these are some of the challenges that were discussed, including our needs for standards and the time I think it's still true, there are multiple technologies being developed for epigenomics and that is certainly true for multi-omics. And we then said that well, perhaps then there should be some sort of standard either spike-in or standard cell line or cell system where different technologies will be also be always be implemented on that cell system. If you're gonna collect a large cohort dataset, you must also repeat your analysis or assay on that same sort of standard. And that may seem possible to compare across different studies. At the time, one of the suggestions was to use one of the ENCO tier one cell line, for example, the lymphoblastoid cell line because we know like we have a huge amount of measurements information about that cell line. I think this remains a valid sort of idea. I think that if we're talking about new multi-element technologies, either single cell or whatever new analysis pipeline scale, we're gonna use it as scale across patient cohorts. We still need sense to know that batch correction, different variation that these different studies can be compared. But I would add to that recommendation that is that rather than using a single line like the tier one cell line, we should take advantage of the additional and resourcing the HAPMAP project and have more ancestry diversity, gender diversity, because I think this would really enhance, you could implement that idea about diversity into the standards that we put into our technologies. The second recommendation was about standard computational pipelines. We heard certainly a lot about the importance of integrated data integration and data analyses. The third recommendation was actually, this is specific to the epigenomics database of regulatory elements. And that actually I think is already being done by the ENCO consortium. So that is one check off of the list. So I would love to hear some feedbacks or reactions to this set of ideas. Sounds good. I think references, I mean, you could also add age, diversity, I suppose to all those too, right? I can chime in, Howard. I think the cell lines are fine, but of course identifying, having reference tissues, more complex identification of regulatory elements that are gonna be more relevant for SNPs is gonna be benchmarking against some standard tissue, I think would be useful. Right. Yeah, the challenge there is that you need this resource to be kind of like self-renewing and inexhaustible, right? Because you're gonna eventually use it up and then what happens to that standard. Sure. I think it's definitely worth going back to that 2017 and not repeating and making sure we're advancing from 2017. Yes. And I wanna add that both Mike and Joe Ecker were part of the 2017 recommendation. So thank you again both for your input. Howard and Judy, we really can't thank the two of you enough for co-chairing this workshop and also to the entire planning committee. We got a chance to get to know some of you better and you were all just fantastic. And before I just sort of give a few logistics to wrap up, I wanted to see if Dr. Green wanted to say anything. No, I wanted to just echo my own things. I've tuned in bits and pieces but listened to the last hour in particular and we're getting a tremendous amount of feedback. Now the hard part is for us to synthesize it and really strategize both internally and with various advisory groups including our advisory council and exactly what an initiative might look like or what sort of the next steps should be in thinking about the possible development of initiative. But you've given us tremendous amounts to think about. So I can certainly promise you we'll have significant amounts of internal discussion. We'll follow. Thank you, Eric. Yeah, so I put in the chat that we will follow us in a couple of weeks with a draft workshop report, potentially get some additional feedback on these recommendations at the end. I think we're firing on all cylinders and got a great list of things together. And also we just, we couldn't close without thanking again our entire AV and communications team for all the work they did. And I wanted to give a special shout out to Joannella, Morales and Marie Brennan and Laurie Finley who really did the lion's share of the work pulling this all together. So thank you all have a wonderful weekend and we will let you know as soon as this information is available on genome.gov. Thanks all. Happy Juden-Chinked. Thank you. Thank you, Judy. Thank you.