 So are we starting on time, Aaron? We've done an amazing job. Everyone has done an amazing job of being on time. Yes, I completely agree. We would absolutely love to start on time, Ajay. So I think as long as we have our panelists here, they could turn on their cameras. Aaron, I want to back up and prepare a few slides. Great. I'll turn it over to you, Ajay. For those of you who don't know Ajay, he is a terrific colleague, a program director at NHGRI in the Division of Genomic Medicine. He oversees a number of large trans NIH programs, including links and has significant involvement in the HubMap program as well. Thanks, Ajay. Over to you. Thank you. Yeah, I was worried that I would have to introduce myself, which is always a bit weird. But we are going to switch now from individual long detailed presentations or various keynotey vision statements to very brief presentations along with followed up by sort of a panel discussion. So please be ready with all of your questions. So essentially, we have three panelists, Dr. Sarah Teichman, Dr. Lana Grammere, and Dr. Neil Hanchard. They have been asked to speak for five minutes each and to address three basic questions. Where do we want to be? What is the aspiration goal and what are the barriers and opportunities? So as I said, each one of them will get five minutes and at the end of four, I'll just briefly pop in and give a one minute warning. So first up is Sarah Teichman. Dr. Teichman is the head of cellular genetics of the Welcome Sanger Institute. Her research focuses on four complementary study areas, transcription regulation and gene expression, similar to genomics, immunology and protein complexes. Dr. Teichman is co-founder and principal leader of the Human Cell Atlas International Consortium. We changed to create comprehensive reference maps for all human cells to further understand health and disease. Dr. Teichman, do you have slides? Well, I have prepared a few, actually. Yeah, just after listening to the talks earlier. Great. Thank you, Adjai. Thanks for inviting me. And it's a real pleasure to be here. And of course, I'm part of the Hubmap Consortium, which Adjai co-organizes via the NIH Commons Fund. And so I'm somewhat familiar with NIH, kind of certain types of modes of investment. And so it's a real pleasure to speak about sort of my feeling about future opportunities. And I just want to put up a few bullet points and then I'll elaborate on why I think that they're valuable. So single-cell multi-omics and electronic health records in a longitudinal manner, I think, will give insights into disease progressions at a very high-dimensional, detailed cellular molecular level. And I'll explain why I think that based on recent COVID research. And then further, you know, we've heard about resources that are genetic sized. You could buy a bank. It's the one that I'm most familiar with. And then there's sort of counterparts here in the U.S., so to speak. And, you know, combining these modalities with genetic-sized cohorts that are genotyped or whole genome sequenced, I think would then allow you to layer on top your interrogation of the tissue or fluid that you're studying by single-cell multi-omics with the clinical metadata and the genotype data. And, you know, we've heard things along these lines earlier in the earlier talks. And this is really just a more specific kind of suggestion and then looking at disease versus control cohorts, which I'll give an example of in COVID. So it will be remiss of me not to say that as a healthy reference in terms of the cells and molecules in our body, I'm obviously partial to the human cell outlasts that Ajay's mentioned. You know, there are some members of the human cell outlasts community here, and I invite all of you to join as well if you're interested in the healthy reference map of the human body. And the technologies that we're using to map the tissues are single-cell genomics, spatial transcriptomics, spatial methods coupled and stitched together by computational methods. There are, I mean, Tully is right that it's sort of, human cells is not there as it were. It's not ready. There is significant amount of data that's fully open access in the data coordination platform and there's a lot of additional data under managed access. And you can see examples of numbers for major tissues and organs here. The kidney, half a million skin over a million airways, many millions of cells that are in suspension data, gastrointestinal tract, human developmental stages and tissues, of course, many millions because they're so varied, liver, half a million, so on. So these are just a few examples. And computational methods for integrating them into and placing them into space are rapidly developing using things like cell to location from over a baracter where the suspension data is mapped onto the spatial transcriptomic data here for reconstructing tissue architecture at single cell resolution and full transcriptomic breadth. And I'll skip on to why I think this can be valuable in disease and what I've learned through COVID-19. And that is studying blood, which is one of these tissues in the body that's very accessible and like urine or saliva or other things that have been mentioned or potentially also biopsies. We got together with three centers in the UK and didn't have much longitudinal data but were capturing COVID patients of different disease severity. The asymptomatic mild moderates were critical and basically this allowed us to define cell types and gene expression signatures in a lot of detail and resolution and define different cell and molecular signatures of the different cohorts. Now these are not longitude and as I said, they're kind of snapshot captured. This wasn't an eight year project like Mike's. It was a few month project and what I realized was that in the UK many of these are collected under so-called bio resource ethics where you can go in and you can apply and you can go into the electronic health records. And so this cohort is 140 patients in total. Sorry. This cohort was 140 patients in total. They're all SNCC-chipped. They all have electronic health records and basically we were only scratching the surface in this Nature Medicine paper that we published a few months ago. And it's such an exciting time. I think there's such rich data for coupling different modalities at the genetics level, at the multi-omics level. This is site-seq. So you've got protein. You've got RNA. You've got BDJ sequencing. You've got electronic health records. There's huge potential here. So I'll leave you with this slide again about what I think is the future of sort of medical functional genomics essentially where you're not just looking at a healthy reference state in the biology but you're also wanting to go into disease. Thank you. Now we'll stop there. Thanks, Dr. Packman. Dr. Garmier. So Dr. Garmier is Associate Professor of the Department of... in the Department of Computational Medicine and Bioinformatics at the University of Michigan. She leads a multidisciplinary team engaged in computational and experimental human genomics works on translational bioinformatics, research interests of single cell sequencing and bioinformatics, and integration of omics and clinical data and high-tropic methods to study non-colonial RNA. Dr. Garmier. Hi, everybody. Thank you for sticking around. It's a great pleasure to be in this workshop and learn from the colleagues about the most recent technologies, assays and ideas, as well as helping planning for the next phase together for HGRII's machine plans. As I was introduced that my group is really a computational group. We focus on translational bioinformatics. In fact, when I started my research group from University of Hawaii, about eight, nine years ago, we started with multi-omics integration as the overarching goal. And now after eight, nine years, our overarching goal is still multi-omics data integration, but we updated it somewhat to include more than multi-omics or also other modality of the data as well in our research. So in the lab, just for those who are not familiar with me, we do multi-modal patient survival prediction supported by one R01. We also do a lot of integration work at a single cell resolution developed by Informatic Tools. We also think about forwardly how we can use the genomic models to make them actionable and develop a drug reposition pipelines and methods computationally. And lastly, I also have a personal and a professional interest to pregnancy adversities. I have another R01 founded to study precomptial multi-omics based precomptial biomarker study. So as you can see, you know, really our research is very centralized around the multi-omics model data integration. So from the data science perspective, I can tell you that currently there are a lot of methods out there that claim to do data integration, but if you examine very carefully, you realize that most of these methods are not supervised by phenotypes. Rather, they start from bottom up by integrating different kind of genomic data and then trying to identify the patterns within the coherent patterns with the different genomic data. For example, this A is a kernel based method and B at the bottom is the neural network based method and C here is correlation based method and D here is the mathematical framework called a matrix factorization method. All those methods, they have one thing in common that is unsupervised starting from the heuristic of the genomics data. So it comes to me that there's something lacking, which is why can't we use a phenotype as part of the rules to help decide what the outcome of the prediction could be. So due to that consideration, our contribution to the community is to develop several versions of deep learning based method. We call it a deep probe. It is a multi-omics data integration framework supervised by survival. We use survival as a phenotype. So without going into the great amount of detail, this method draws an assembly approach where you take different genomics data as an input, you do a bunch of preprocessing. And for each genomic data type where you use auto encoders, you reduce the dimensions of the input features. And then you link those transformed hidden layer features with survival features to define them as new survival features, informative survival features. And after that, we do clustering approach, identify optimal number of clusters in your cohort, and then you build a classification algorithm. And then because it's a supervised method, it can be applied to predict many other cohorts across various types of genomics data types. So this is a framework of this methodology. We tested initially on liver cancer. So we built the model using TCGA and we applied it on top of a variety of other population cohorts of very heterogeneous populations from various ethnicities across different omics measurements, either in RNA-seq or microRNA or DMs, et cetera. So here the result shows that this model is really robust across all the populations, all the cohorts, all the genomic of genomic measurements. More than that, we also expanded this framework to predict the 32 types of TCG cancers to stratify how many patients survival subgroups are there, can we identify? And turns out most of the cancers can be optimized into two survival risk groups. So we are hoping that this work is going to be useful for the clinicians in order to provide a first-line reference for them to predict what their patient survival status could be. This work is supposed to be accepted by genome medicine soon. So that's one example of what we have done from the computational aspect. Overall, from my perspective, I think as data science is working in this field over many years, here are some good points I would like to address in terms of challenges in data integration. One of the challenges, there is very complex confounders. People may actually not even be aware of or take them into account. We talked about earlier why is the cell type and tissue type heterogeneity. To do that, you need to use deconvolution methods. However, during the process of deconvolution, we find that those methods are not reliable. The data sets are not really great. It's a whole another box of worms that we opened up recently. I'll show you a slide at the very end about this issue. There are other confounders that need to be addressed in the study. For example, some phenotypes, AGS, ethnicity and treatment etc. Especially for TCGA, we worked a tremendous amount. The treatment options are really limited. It really limits how powerful we could use this kind of public repository to do new discoveries and treatment suggestions, prediction, etc. In my perspective, there have been a lot of methods to be the best in their own view. However, there really isn't enough unbiased benchmark studies that are very active. I think this field of benchmarking really deserves a lot more attention. One issue is the next slide I will tell you. Also, like many other people already mentioned about the next challenge do you integrate different types of omics data with other data modalities for example, image data and electronic health record data? And further, how would you integrate the different scales of data from the single cell resolution to bulk resolution to the population resolution? So these are all challenges and at the same time presenting themselves as opportunities for us all. So here's the issue that we have right now. Do you have time? No, actually. So do you want a few seconds to summarize? Yes, I just have a few seconds to illustrate the problem is really long trivial. Here, you know, you start with placenta cohort for my collaborator over 300 samples and then they are acid into, you know, gene expression RAC versus GM oscillation. And so because the availability of the two types of different approaches to try to decombulate the data, why is to use Sarah's data published from 2018 Nature paper on placenta and we use that reference and decombulate the major cell types in placenta. This is the distribution of different cell types using RAC data. However, if you use another reference that claim to be cell-specific DM oscillation markers, you get a very different picture of the cell proportions. And which one is true? Which one is false? I don't have a confirmative answer for that, but I do think there's some limitation in terms of, you know, bulk level cell type-specific DM oscillation markers. We tended to think that single cell RAC data-based references is better. We also have some, you know, private data from placenta in our own lab that matches pretty well with Sarah's paper's cell type proportions. With that, I just end my discussion about the challenges that we're faced. Okay, thank you. So the next presenter is Dr. Neal Hanchard. Dr. Hanchard is a clinical investigator within the medical genomics and metabolic genetics branch at NNGRI Division of Interneural Research. His research interests lie in the use of genomic and genetic tools to understand complex pediatric diseases and traits in diverse populations. Dr. Hanchard aims to understand the molecular mechanisms that lead to disease and to catalog related genes biological pathways and mechanisms. Dr. Hanchard, thank you for starting to share already. Thanks, Ajay. So thank you for the opportunity to sort of share this, you know, realizing that you're the last speaker at the end and the big question is where do we want to be? That's kind of a fraught kind of question to ask. I thought that the talks have been so good and stimulating today that there's actually some good data that can come from this. So I'm coming at this from a sort of physician-scientist-clinician translational research type of viewpoint. And so I started looking from this review I read many years ago where the sort of omics are represented as two-dimensional entities that sort of give rise to a three-dimensional body at the end. And I thought it was really useful and of course it could be updated nowadays with additional omics and sort of environment and diet and EHR and all the other things that we've heard about today. And in the ideal world again you'd have that available for all clinical states, right? So whether that's health or disease you've got this sort of multi-planar viewpoint of each one of these. And in the ideal world you would have this available for differing disease states, different ages, and different ancestries. And so from where we are now obviously there's a lot of, there's the issue of how that would work as a reference if you have to harmonize the data and their governance issues. But I thought about it also in terms of going forward from where we are now, in terms of the three areas that we've kind of discussed a bit more today. So like from a data integration standpoint the real problem and I'm not a statistician but even I can see that there are unique challenges to integrating this kind of high-dimensional large data sets over multi-planes, right? And so this becomes an issue about how you can use hypothesis, how you can use this kind of data for generating a hypothesis for identifying a biomarker or patterns of biomarkers that can be utilized to understand sort of health and disease states. But on the other hand, there are potentially sort of lower-hanging fruit that relate to sort of going from those kind of relationships that we know very well right now, those genotype, phenotype, or transcriptomic phenotype type of relationships and using this kind of data in order to trace the causal pathways and sort of understand more about the disease mechanism that we intervene with. From a technology standpoint again, so I'm a physician scientist, I'm often collecting cohorts of children in far-flung parts of the world and so one of the real limitations to this is sort of being able to assay all of these from a single sort of aliquot of sample. So obviously in the single cell world this is moving towards where we're getting to, we're able to assay all of these layers of omics from a given sort of sample. But you could even think about it as one step above as you sort of collect these cohorts. It's really helpful if you can collect one sample from an individual that will cover everything rather than one sample for RNA, one sample for DNA, one sample for metabolomics. And when you're dealing with kids for instance where you have a limitation in the amount of sample that you can take then this becomes particularly to the four. But I think all of these things relate to this idea about study design and what we've heard today really talking about the utility of having longitudinal study designs to be able to understand temporal, intra-individual or intra-disease variability and being able to utilize those insights to understand trajectories and build new biomarkers and understand even thinking of time points as sort of pre-intervention and post-intervention sort of understanding the utility of it. I think the problem here of course is one of cost, right? So do you have enough can you collect all of these that cost a certain amount of money to recruit one individual and do all of these studies on one individual? You sort of multiply that by doing multiple time points and you multiply multiple time points by a cohort of individual that gets expensive. But I do think that some of the approaches in technology will sort of heavily inform this like if we can get down to single sampling or again with data integration where you might be able to identify those particular modalities that will be most useful for your particular study question so that you're again be able to use imputation where it needs to be utilized in order to sort of make the whole effort a little bit more tractable. And that's really all I had and I'm certainly open for discussion at this point. Thank you, Dr. So you made up for time. Great, so we are now open for a set of discussions and questions to the panelists. I see that there are a few questions in the chat. Does anybody, okay there's Rachel. Rachel your hand is raised. Go for it. We can't hear you, you're muted. Sorry, I was wondering what the panelists thought about given what we know about the limitations of many of these omics technologies and I work in the field of metabolomics so I'm particularly thinking of that and we really don't have anywhere near the full coverage of the metabolome there's a huge amount of missing information there's a lot that we just don't know about the metabolome and I think the same can be said for many of the other omics. So how can we integrate individual omics first before we try and integrate them or is it even possible to do that I think with the metabolome we're a long way away from being able to capture the whole metabolome but how far can we go with integration before we fully understand each individual omic. If I may Rachel hi this is Laina I actually work on metabolomics space quite extensively so I think I can share with you my thoughts metabolomics data actually you know it's most close to the phenotype data so depending on the purpose of your study if it's for biomarker I would say you know metabolomic data are the greatest measurements for biomarker candidates in terms of data integration if you look at the correlation yes there might be some issues integrating the metabolomic data with other data types but I think you know you could get around with that if your purpose is for diagnosis prediction you could definitely use platform for example deep learning or some other just even like renormalize all of the input features by skinny them to be uniform and stack them together and try to do prediction you can definitely do that however in terms of understanding the mechanism biological mechanism that's a different story you know for metabolomics because you really don't have to get a lot of measurements you're missing out a lot of things you only see the tip of the iceberg but the hope is that you know by looking at other genomics like transcriptomics they can give you some trace mark of you know where things are happening right the enzymes that converts metabolites one to another if there's consistency then it shows some sort of trend you could do some heuristic thing that way yeah I agree that you know you do have the potential where you are now that with integrating depending upon what your question is right so if you're sort of interested in patterns of biomarkers then you probably have a resolution where if you combine that with the other omics you'll get what you want but if your if your goal is really to understand the sort of deep detail of a particular metabolic pathway in a particular disease then you might be a little bit more stuck where you need to spend more energy sort of understanding what this is exactly like so you know where you are now is highly useful for me for instance but maybe somewhat less useful for somebody who's sort of targeting a specific drug pathway for instance I think those are great answers don't think I need to add anything thank you so there is a there's a question in the chat about about how to create benchmarks and I think it was also there in one of our slides and I think Joe Ecker expanded a bit to include both experimental and computational methods and benchmarking these what are people's thoughts on benchmarks I mean it's a sort of task that no one wants to take up I somewhat disagree you know actually having been on a lot of experimental and computational benchmarking papers over the decades they end up being extremely highly cited and the Salahalas community as the standards and technology working group has done a few of them and the community is incredibly grateful for that kind of work so it's I think the challenge is it's a lot of work and effort to implement other people's methods either computationally or experimentally and be tough to do that but it is very worthwhile and it's one of the challenges is having a gold standard then our reference data set and so on but definitely I think that type of work is really valuable Judy has a question in the chat also in fact this is a really area that I'm starting to pay attention to because I have this problem myself as a computation biologist you know we run different methods computational decomposition methods to get the different, vastly different results for example there are different cancer types and so it came to this question okay are those reference data are good enough those data were measured under certain context micro-environment does it really apply to other scenarios other RNA-seq data I don't know I think it needs to have tissue specificity tissue specificity or cell type specificity so my feeling is that needs some update in terms of the reference data in terms of computational methods themselves there are you know various methods developed dealing with different omics space RNA-seq, DMF solution and now from singles RNA-seq data to bulk RNA-seq data etc so this area of research is very much underappreciated by folks in the computational biology domain because some of the people just don't think it's novel it's like you're trying to be a policeman and you oftentimes get criticized one method performs better than the other method etc so it's hard work to do that like Athara said you have to understand that method good enough in order to make a really fair comparison so I feel like this is falling into the engineering domain you need to have the benchmark data the comprehensive benchmark data you need to collect the methods the method needs to be well documented easy to be downloaded and run and then you need to have the good metrics to evaluate them systematically without bias so all of those are engineering concepts engineering concepts and people don't necessarily appreciate that so you know I'm trying to use this voice to raise awareness in the community hopefully the Founding ATSC can help support research in this area it is actually research okay thanks I think there are a whole bunch of hands up Howard had his hand up for us I believe and people are sending me direct messages about having questions so let's let me do an order Howard, Lindsay and Thiago thanks AJ this is a question for everybody and so we've heard about these various study designs very ambitious very care for collecting multiple data modalities and my question is what happens when you actually have let's say gaps in the data how are the methods current methods in dealing with that imagine that you start it and then some individuals are missing certain data types they're missing some time points and then in this kind of framework are you always limited always by your weakest link because if that were the case right as soon as you have one gap then you can use that example you cannot use that patient then this makes the study design very challenging and also very expensive because everything has to be perfect what are basically ways of dealing with sort of data imputation ways of addressing this kind of an issue maybe this is an area of challenge can you speak about that and the way I was thinking about that was more on the EHR side rather than on the the multiomics on the genomic side but they could equally apply to the genomic side and I think that the it sort of comes out in the wash in terms of imputation if the numbers are large enough then you can kind of smooth over missing values and that's why one of the things that I emphasised was for this type of disease research because of this challenge that you're mentioning Howard but then also because of simple inter-individual variability and so on just numbers are important numbers are going to be important it's different from sort of healthy reference Salah was saying kind of work I agree I also think that we can't allow perfection to be the enemy of good right or good enough in the sense that if you have if you have multiple modalities on multiple individuals and you can get through that numbers issue then you can sort of make a good guess if you have and if the remaining data that you have is rich enough right so if everybody's missing 5 or the 8 then that's kind of problematic but if you've got sort of a missing whole single missing hole in each person then that's something that you can kind of smooth over and be able to understand particularly as the data sets get larger understand what sort of goes with what in a certain sense I think this speaks to the point that Neil you're making about the cost like multiple modalities multiple time points for people if we actually can then deal with data gaps maybe then for certain modalities we collect every other time other things right we have certain we have much we have a lot more degrees of freedom in what we need to do and therefore we can sample more people or more time points Howard as far as I know there are some statistical methods like group factor analysis that you can utilize under when you have large chunks of data not being available for certain subsets of your sample I don't know in practice how well they they perform but they claim to be able to do under these sorts of things yeah I think the hard part is the dimensionality right so you can sort of you can impute genome data but you also have to you know if you're imputing you know transcription data you also have to take into consideration that you have methylation data that can help with that or you have genome data that can help to fill in the gaps there so I think the dimensionality in some senses might actually working your favor if you have enough of each um so we're actually doing exactly what Howard is talking about some of the you know sample that we have there is no RNA-seq data available at all completely missing that is an extreme case you only have the methylation you need to impute all the populations all the all everybody's RNA-seq data what you're going to do you know you just have to deal with it based on what data are available you can try to use as a training data set and try to build a model and then hopefully the model is good enough to be able to capture the variation within the population but really the best way to do it is to design the study ahead of time well enough you prepare that you know there could be full there there could be problem down the road and have extra samples so if once as they fail you can you can get more measurements um it is really um you know that come up with with small amount of missing data you can impute there are computational methods handling that but if it's a whole you know entire population how are you going to impute that's a that's a grand challenge okay next question Lindsay hi thank you so much um I was thinking kind of in the study design space about comments that Nancy and Tully and other people have brought up about how to get around um I guess more correlation between data types Nancy had mentioned you know if you've got a vast database and you can see that this these sets of patients while they don't have genetic data there's a similar cohort of patients with genetic data and you can kind of infer that they may they may have a similar underlying genetic cause and then Tully's comments about you know how we've got a lot of correlation between tissues and that drills down to cell type commonalities between these tissues and then there was another comment about there seems to be more sharingness among molecular phenotypes than there is at the genetic level and this can kind of compensate for lack of I guess genetic diversity in a lot of studies that have been done and you can kind of look at these molecular phenotypes across populations and get at the root cause of disease I'm trying to think this is really just kind of a broader question is are there ways that we can combine these these thoughts and these correlations and the sharing as models and say you know if we collect these kinds of data and this kind of information this is the core set that can kind of transcend some of the limitations of the broader genetic studies that we're having where we're lacking population information and we're lacking you know potentially the genetic information but we have you know RNA or something else it's just kind of a broader thought process of really drilling down to which modalities will give us the biggest bang for our buck in a larger cohort of any given disease and so I just kind of wanted to bring that up I saw a thread throughout some of the talks today about that and I just wanted to bring that up for discussion. Yeah I agree that's kind of what I was alluding to I think if you on the sort of data integration side can understand which things give you the best bang for the buck then you can design your at least for the study that you're interested in if you know it's going to be different if you're looking for biomarkers whether it's looking for disease mechanisms etc but if you can start to understand you know which modalities are the most useful in that then you can design the study up front to say well at least we need to get these things or if you can't get this then you need to get this and those things might help with as I said bringing the cost down to something that's manageable but also it'll help ultimately with bringing the statistics into you know something where you can be more robust. Yeah and I guess the more discrete question is are we there do we have enough information where we can make those determinations for some some diseases or some data sets of interest or is that a gap that still needs to be filled in certain ways and and just suggestions on how to do that. I think it's a gap but I figure I'm not enough of a statistician or computational person to really say whether or not it is or how much of a gap it is but it seems to me that that um we are if we're there it's not like obvious to everybody but if we so it might be that we are there that the data is there and just hasn't been put together in order to do this kind of effort and I'm not sure which one of those is going to end up being true. Okay so I think at England say you probably hit on a hard question that nobody has anything profound to impart at this point. So let's move along I think it's Tiago Tiago you still have a question yeah hi hello so I was wondering if the panelists could comment on the relative contribution of so basically different aspects to data integration namely I see this routinely in a lot of papers that people struggle to explain the functional links between discordant data sets between you know we assume that for example data methylation will lead to gene silencing but that doesn't always the that's not always the case you assume that RNA is going to be translated into proteins that's not always the case so I'm wondering if there is one aspect about data integration which is that we still don't fully understand from a mechanistic perspective what is the relationship between different types of data that's one question or one aspect another aspect is the sort of static component that we've already talked about like so do we need like really tight time courses do we need a lot more longitudinal data to really understand the dynamics the directionality between the you know the relationship between these data sets or you know all those two things for example are actually just a minor aspect and truly the major gap is that we just don't have computational tools that are refined enough to really infer what is meaningful about these data sets so do we need more sophisticated machine learning tools so I was wondering if you could comment on basically different aspects that contribute to the challenge that is data integration so I can comment on that in fact that you know the fact that most of the population measurements are static it's one snapshot that really isn't a problem I think it contributed part of the problem you see a lack of correlation in a different omic space if you assume since our static in steady state then the correlation could be that assumption could be held more or less however we know that that's not always the case right so in fact you know there's a recent work from Fabian's group they applied non-steady state assumptions to single-sterilistic data they improved the binary velocity algorithm better resolved some of the issues or backflows in the RSI data set so I do think that this is the direction to go given that if you can have time series measurements let me ask when genetics is the other thing so time series genetics yeah I agree with both of those I think there's probably a gap on both sides I think there's a gap in terms of how we you know in terms of being able to actually gauge what we can get from the current data sets but I think there's also a gap in that we don't always have enough longitudinal data and the times where we have gotten longitudinal data we've learned a lot more so it seemed to me that we've probably got a gap on both sides like computation and to be able to make the most of these set of cross sectional simple point things but also to really understand that kind of variation of the temporal variation which is to me almost a completely different ball game the tools may be the same but I think it gives you a different insight to the biology A.J. you're muted thank you we have two more or less than a minute left I'm going to pick a question from the chat this is extending Tess's question and also to Sarah HCA's work what are some of the bottlenecks and solutions in more comprehensive multi-omics studies in a global context both regarding sampling and developing capabilities to do omics assays and data analysis outside the western world yeah I was sorry I started to type a response to that in the chat so I didn't have to delete all of that so I think that there's a challenge to some extent in sort of selling the utility of doing a lot of this work when you have because you also have to sell the idea that we want to sample people multiple times on multiple occasions and you want to be able to take multiple samples from each person at each one of those occasions and sometimes that's a tough sell in terms of what you are going to be able to return on investment for some groups I also think there's a sort of governance issue which is probably largely around education because the same thing happened with the genomic data that was generated is sort of how is this going to be used and will this have some benefit to us and are there going to be loss of information and privacy etc so I do think there are governance issues that have come up as we have started to do this at least in H3Africa we've started to do this but then there's also ownership left to be done to sort of convince people that there's some utility in having these multiple modalities to be done on each individual so I think they're so mountable but I also think that there are large collections that are going ahead that are very unidimensional Okay, so with that I would like to thank Neel, Lana and Sarah from SpeakingThing for an excellent breakout sort of session here discussing some of where we could go and what sort of challenges still remain out there there's also an excellent set of questions and discussions in the chat you can actually save the chat if you want with that I think I'll bring the session to an end and microphone back to Aaron in the journal Judy Ajay, thank you, we at this point will turn it over to our co-chairs Judy and Howard to provide a quick summary of today which is not necessarily an easy thing to do Thank you everyone for a very stimulating day just to briefly sort of refresh, we've heard quite a bit about the vision for NHGRI the current portfolio, the really growing area of investigation to multiomics and then really some of the exciting work that's possible and also then some of the challenges sort of really scaling up this area and some of the kind of the current themes I've heard about really I think including a scale and also the kind of diversity of projects that can be done using these new technologies Judy, what do you want to add to that? Yeah, no I think the charge we got from NHGRI is there's a lot of great work going on that we've heard about and what are the specific recommendations for NHGRI in terms of staying at the forefront it's obvious there's data explosion I actually think Neil Hancher gave a terrific summary of the kind of the triangulation of the three different things, the layers and so forth but I would go back to the keynote talks from Nancy Cox and Mike Snyder, a lot of common themes and Nancy's charge I think is extremely well taken, which is we've done a lot of great research, substantial advances and that's the excitement of doing research but in terms of clinical implementation that's slow and I think that we're all feeling kind of that tension between the advances in research and what you can actually implement the value of passive data collection that you get through the electronic health records or the wearables are amazing, as a community we have to make sure that that doesn't actually worsen health disparities, you can easily see how that might be the case the older I get the more interested I am in aging questions so we had a couple of very nice references to aging and one of the kind of there was a couple of dichotomy from the notes of really comparing the n equals one studies versus population base references rare variants versus common variants and I think there was multiple speakers talked about kind of the what's rate limiting and a lot of it it's interesting we've kind of solved the genetics part of it but then the phenotype part I think is I think we all agree that how do you track disease at the earliest stages such that you're really going from that transition point of health and disease and then we've had a very nice discussion about what are we missing right now and so that's kind of what I got from this any other comments we want to make or those are my major kind of cross cutting themes well thank you Judy I think maybe we can bring today's meeting to a close just to remind everybody that the meeting continues tomorrow and there will be many more discussions and perhaps tomorrow can be a little more directive and think about what specific recommendations we might want to make and I see Erin also has a comment yeah thanks Howard I think before we sign off I did want to just come back to what I mentioned at the beginning about Juneteenth being commemorated as a federal holiday I just want to quickly share my screen and put up a slide if that's okay with everyone so here we go can you see that okay so as I mentioned in the beginning we we're really excited to hear that President Biden is creating this federal holiday to celebrate and commemorate Juneteenth we were expecting to hear a little bit more from the Office of Personnel Management and HHS and NIH leadership throughout the day but we quite haven't received that information yet so I did just want to let everyone know especially our colleagues at our federal employees that we do plan to have the workshop tomorrow but we completely understand if any of you feel like you'd rather use your day tomorrow reflecting and celebrating this holiday I did put up on the screen a campaign with a hashtag move for equity which is being celebrated at the NIH in collaboration with 8 changes for racial equity so if you have a chance you can check that out as well but I did just want to put this out there if we hear anything more we will reach out to you throughout the evening but otherwise we're really looking forward to seeing you tomorrow this has been incredibly productive and insightful day Joannella did you have anything else you wanted to say before we sign off? Thanks Sarah, no I just wanted to say thank you to everybody for joining it's been a very stimulating discussion and I am very much looking forward to tomorrow see you tomorrow at one o'clock thank you very much