 Welcome everyone, we have registrants with us from 25 countries today and we're so glad to have you here for the first in our new seminar series called genomic innovators. So I'm Dr Chris Gunter and I serve as NHGRI senior advisor to the director for genomics engagement. And I'm very pleased to host today with my longtime friend Dr Lisa Chadwick. Hi, I'm Dr Chadwick. I'm a program director in NHGRI division of genome sciences and I'm the lead of several of our programs that target early stage investigators which includes the genomic innovator award. And we also want to strongly thank the team at NHGRI who helped us organize the series, particularly Susan Vasquez and Gerald Samani who are both wonderful and have helped us do a number of these so thank you to them. So on this first slide here you can see, as I mentioned today is the first one in this new series we will be hosting them quarterly that's our plan. So the next one will be January ish. But you can find out more details for the series that are listed on the webpage here on the bottom if you're interested in the future, and we'll make sure to let you know about them as well. They'll be posted on our YouTube channel after so if you enjoyed today's talk feel free to share it later with your friends who could not make it because we all know they're competing meetings. So these both after the extramural funding program we had in the past which Lisa was responsible for called the genomic innovator awards and after the fact that we both view both of these speakers as true innovators in the field of genomics. So on the next slide you can see more about our speakers, and I'm going to introduce both of them to you. Luca Pinello is an associate professor at Massachusetts General Hospital and Harvard Medical School. He earned his PhD in computer science and mathematics from the University of Palermo in Italy. His research program uses computational approaches which we'll tell you about today to systematically analyze the sources of variation that affect gene regulation, including epigenetic variation, genetic variation, and single cell gene expression variability. And our second speaker Dr Karen Mulkey is professor of genetics and associate chair for research at the University of North Carolina School of Medicine Chapel Hill. She earned her doctoral degree in human genetics at the University of Michigan. Her research program is identifying genetic variants that influence common human traits with complex inheritance patterns, and also seeks to understand the biological function of identified variants and interactions between genes and environmental risk factors in disease pathogenesis. So while we're talking we encourage you to put your questions for the speakers into the Q&A section which you should be able to see at the bottom of your screen. And then Dr Chadwick will ask those questions of the speakers during the question period when the talks are finished. And with that I'll turn it over to Dr Pinello. Hi everyone. It's a great pleasure to be I'm really excited to kick off this seminar series. Today we're going to play genomic battleship with CRISPR technologies to uncover non coding functional elements and their phenotypic effects. So, can you please enable the control of the slide. It's not, it's not working. Should I share my screen instead. Okay, nice working. Yeah, so I want to, you know, first start with the disclosure my slide and disclosure. And then I want to now like briefly outline what we're going to talk today. So I want to briefly mention my transition from computer science to computational biology. And now like energy I really enable this. And then I want to talk about the DNA puzzle and where we can focus in the genome to understand function. And then propose a framework to essentially have a multi scale exploration of the non coding genome using CRISPR tiny screen. And then the second segment that you know is related to, to, you know, whether we present current instead will highlight disease variants in non coding functional elements, CRISPR technologies for variant element function, and also now to look in future opportunities. So my lab, you know, is trying to combine computation experimental strategies to understand gene regulation, and we're working with different technologies and assays including a big genomics CRISPR genome editing and single cellomics. And we are trying to answer two specific questions. The first one is, how can we uncover and dissect functional coding elements in the genome. And the second one, how can we model self choices. And today we'll focus on the first one. And this is just a brief slide about my career trajectory from this little town in Italy called Alta villa. And you know I did my PhD in computer science in Palermo. And you know as you can see from this slide, you know my first, you know transition to computational biology was in 2011 during my postdoc. And throughout the years, as you can see like energy are I has been really important providing funding to make me you know to the next stage. So, as a postdoc and I received a can 99. And then in I was promoted to assistant professor. And then in 2019 I received this genomic innovator award they gave me so much freedom to explore my research direction. When I was promoted to associate professor in 2021. I joined the energy energy right IGVF consortium meeting a characterization center so just wanted to I like this because there are a lot of funding sources that may enable other people to make the same transition. This really helped me to create you know have the freedom to create over the years several computational tools and to do team science and enable other researchers to do cool analysis related to genomics CRISPR genome editing, and also nice and crazy visualization in Victor reality for single cell genomics. So, going back to the question you know I was really passionate about the genome business there are so many puzzle yet you know to solve and as you know the first draft, the first good draft was completed in 2001 was not until this you know we have you know the full complete sequence of the genome, and there are like several parts in a fair light in a random region of the genome, and they ask you like what is the function of this small segment. You know, if you if you have a coding you know sequence may be easier to spell out the function if you're in a non coding sequence, maybe it's not so easy. And you know the non coding part is really important because 98% of our genome is non coding still you know 80% depending on which I say uses some biochemical activity. And most importantly, many of the diseases associated variants are in the non coding space and current you know will will expand on this point more. So what you know really made me like passionate as a computer science scientists about studying genome function was essentially this this puzzle. Oh come you know almost all the cells of you know all the cell lower body share almost the same code, but they have different output and phenotype and gene expression programs. And it was like really fascinating. And you know what the years I learned that one of the explanation to this puzzle is the chromatin structure, and in fact, the way the DNA is folded is really different in different cell types, and different types of DNA is essentially how to specialize this code and you know these are, for example, nucleosome position in the endometallation, and it's the modification that you can think as a way, you know, of enable or disable part of you know this, this genetic code, and to create this, you know gene expression programs that are specific. So this you know can also help us to think where we want to focus you know in exploring this, this genome is you know the genome is quite large you have three billion nucleotide pairs. So we need to think about where we want to focus in this exploratory game. And what what do we need you know to be smart so one first thing is like we need some mechanism to highlight visions that are important. We need to, if you really want to understand the function we need to perturb this region and for this we have gene editing technologies like CRISPR. And this can lead us to understand you know progressively, and you know we can iterate this framework. And in order to you know the first step you know highlight, there are many ways to highlight based on genetic variants conservation here you know what you know I'm proposing or you know many people are proposing is to use essentially accessible chromatin accessibility in particular instant modifications, because this can highlight region that are active and specific. In fact, in over the years we're learned that, for example, different instant modification associated with different biological functions for example if you have a gene, you can immediately find promoters that may active or poise or repress, and you know genes that are transcribed but one important mark you know it's a mark that multiple marks that essentially allow you to highlight this distal region that call an answers. And these are really important region in gene regulation because they are so specific, and that to fine tune the genes, the gene expression in different cell types. So this is the way we can potentially like and you know I want to highlight that you know we have so many that you know so much data. Thanks to roadmap a bit genomics and then called that profile you know it's the marks chromatin accessibility and also binding on different proteins for many, many cell types also primary cells. So in terms of perturbation there are like different genetic technologies, one that in a really changed the way you know we can do you know large scale screens is CRISPR genome editing that is composed by a cast nine protein that has some cleavage activity, and then you know you can either an a that essentially can can be programmed so you can synthesize this region of around 20 by spare and by simple complementarity rules, you can essentially target the system, almost anywhere in the genome, and the system will just double strain break that the cell will try to repair. And you know, you know in this process, you will introduce deletion or short deletion or insertion, or you can also trigger other mechanism provide a small piece of DNA to actually have precise replacement. This is really powerful technique you can design easily 1000 of perturbation, if you wish. And then you know over there people have modified the cast nine they disabled you know the cleavage activity and there's this dead cast nine, and now you can fuse other you know a factor of epigenetic modifier, and they're like you know this different version of this, but you know they're like, you know names for for the different version you know global name is like CRISPR I that essentially allows you to repress and epigenetically a region in the genome or CRISPR a that you know allows you to activate and you know for example you can repress genes targeting other promoter or you know creating new and answers, you know in the in non coding non coding region so really exciting technology. In terms of understanding I want to tell you a brief story, and you know this really shape you know the way I think about these are so my career so doing my postdoc in Washington you're on lab at the fortune to collaborate with sort of can then at power, and you know they were really interesting in studying blood diseases, and you know sickle cell, and in 2013, they, they uncovered one and answer that you know was in reach for for genetic variants associated with fetal amoglobin. So, at that time, we essentially decided to actually delete you know that then answer using gene editing, and we saw that you know this and answer was for a gene called BC 11 a. Indeed was you know controlling BC 11 a and BC 11 a being a repressor for fetal amoglobin, the perturbation of this and answer element was able to reactivate in a cell type specific fashion fetal amoglobin. And this is really important because you know, as you know we have different amoglobin we have the fetal one that only at the fetal stage and then we have the adult one. So if you find you know genetics which you know perturbation that we enable the fetal amoglobin, you can ameliorate, you know, for example sickle cell disease, providing a backup, you know option. In 2015 we use again CRISPR technologies this time you know instead of deleting the full and answer, we did the saturation mutagenesis screen, and we uncovered that actually you don't need the entire and answer to to activate BC 11 a. We discovered a small critical element that was a binding site for a combo transcription factor got a one and that one. So you need the single perturbation, you know, around 10 base pair to essentially reactivate this fetal amoglobin. In 2019 we show that this potentially can be potential therapeutic avenue. So you can envision that you can modify human epithelmetic sensex vivo with this CRISPR technology with a single perturbation, and then you know refuse this says, and this can potentially ameliorate you know this sickle set, and this is actually a reality in fact in 2000. I cannot advance oops. Sorry that the advancement is not working yet this is I was saying is really it's truly translational in fact in 2020. You know the first patient, you know was treated with this with this idea and they're like several clinical trials, one you know led by Daniel Bauer, the Boston Children's Hospital, and you know, today, like you know this year there are already 20 patients that are like a really positive response using using this and I was really lucky to be part of this project developing all the computational tools that enable some of these exploratory analysis. So now, expanding on that you know these are really powerful framework so we can apply to other regions of the genome. So, and there is this concept of CRISPR Thailand screen that you know, as my title you know mansion is kind of playing against against the genus image of a board and there are few things that you want to discover. And now you have like you know this I throughput way to throw like you know perturbation on this board, and you know you have a way actually to see which perturbation is associated to to the phenotype to the phenotype of interest and you know you can potentially use target an answer you know variants and so on. So, so in my genomic innovator award. I was essentially proposing well you can do this in multiple ways, but how can we, you know, optimally do this, so the idea is like, we can combine you know different CRISPR genome editing technologies to efficiently first uncover and then decide these regulatory elements. And what I mean, imagine you have like you know Locus a gene of interest associated to you know phenotype, you can design different perturbation. And you know, you can start with you know law resolution so you can use CRISPR and CRISPR a that you know one KB 200 base pair resolution to uncover you know the first, you know, large elements, and then you can use genetic perturbation like cast nine to tile this region, precisely, and you know uncovering you know the sub element that are important, you can envision even to use base editing or prime editing to up to go up to you know one base pair resolution and you know this is something we're developing we're really excited as excited. And, but you know they're like already CRISPR tile screen that you know our group have you know performing of example this is one you know for this piece 11 I was mentioning, and also other group are you know developing the screens. So there is a clear you know need to develop the technology to enable other people to do this at high scale and you know for different phenotypes. And you know in this slide you see no three papers using three different technology cast nine CRISPR and CRISPR I for different goals in an answer the section as we did or an answer discovery as you know this two other paper. They propose in terms of the analysis. There are like different ways to do this and you know I would propose one way that essentially unify all the different you know strategies that we have so far so just to provide a little bit more detail on the CRISPR tile screen. The idea is like we start from a locus of interest with design that means that you know we take all the small sequences that we can use to deploy our perturbation to target our perturbation. Then you know we create our library of perturbation and we can lentil virally transduce ourselves so we can introduce the perturbation in the sense of this circular the cells. And we want to have one per to patient per cell. And then you know you can sort based on a phenotype of interest, for example can be fit all the model being level can be like the expression of a gene of interest can be viability assessive first market this depend on the phenotype that you want to study. And now that you have you know these two population, you can try to recover the guy you know enrichment in this high and low population, and you know just to provide a little bit more detail where you do essentially you sequence so in each cell, you will integrate you have a parallel integrand that you know is integrated in the genome, and you know the part around the the guy there and now this perturbation is the same so you can just design primer, amplify, and you know you can create this nice table, and you can calculate based on the the the lawful change between these two numbers, you can get you know, one detract that is the enrichment score. And then based on this you can essentially understand which part is important associated to your phenotype. You know, this seems all simple so you know I level you know we start from reads of us to find that what we want is something like this you know this deny signal that tells us you know this association and maybe some statistical test that tells you like okay this region is the you know follow up, and there are a lot of steps that you need to do right in between. So in order to do that we created a tool called CRISPR surf that is based on you know the idea of the convolution this was led by Jonathan so was a PhD student in my lab and so the idea of the convolution the convolution, so we want to uncover the true you know underlying regulatory signal in the genome, but we have an imperfect measurement system that you know is our, you know, CRISPR perturbation, and then you know this essentially introduced like some distortion to our thing. So we did the convolution essentially we want to go back to this to this signal, and this depend of course you know that kind of perturbation that you use if you use genetic perturbation you have been a resolution around 10 base pair and these are the small in this if you say be genetic perturbation, they are much larger, you know, like perturbation so you have a lower resolution, you know, one around 200 base pair. So so in the in the measurement problem, what we do is like you can envision that you know this CRISPR perturbation is like you essentially you can measure you know this signal here, and if you have the perfect tool you will observe, essentially the perfect reconstruction but with this you get in a smoothed version and this operation is called convolution. So so now like you know you can frame this as a convolutional model. So you have you know why you want to recover, you have your perturbation with CRISPR. Of course we have noise, and we want to recall, you know, and this is what we observe, you know that the noisy signal, we want to reconstruct the original, you know, the original. And for this you know the good news is like there are several frameworks for the convolution here we propose to use generalized loss so that you know essentially is a way to go from something like this to something that you know is parsimoniously try to understand, you know, explain what we observe in the signal and you know you can learn more in our paper. So we tested this in multiple CRISPR screen. And we show that this is a generalizable approach, no matter which perturbation type you use and you can recover the same validated elements and also new candidates. We also did the dual screen targeting, you know, the BC11A locus and several chromatin accessible sites with CRISPR-i and Cas9 and you know one thing that is nice when you have this dual data on the side of the same region, you can clearly see the value of, you know, this multi scale resolution. For CRISPR-i you recover these three answers, but with Cas9 you recover, you know, the critical element. You have a much finer resolution. And also in a few genetic perturbation you can assess the effect of coding region, whereas with a big genetic perturbation you cannot do that. And then finally I just want to say that our lab is really passionate about building tools that are usable and user friendly. So we build this website that can assist you in the design of the experiment and as well in the modeling that the convolution and then, you know, the follow up annotation and analysis. So this is a line that the code is open source and, you know, if you want to try, you know, like please let me know and you're happy to provide support. And the last slide, you know, they're exciting things I did just show you like some tools based on CRISPR-i, CRISPR-i and Cas9. But now we are using, you know, this new editor, base editor and prime editing that allows you to have, you know, really high resolution to potentially change up to a single nucleotide. And also we are exploring a new framework for CRISPR-i screen where we envision that instead of reading the guide, the RNA enrichment, we can read directly the alleles and we can potentially develop tools to leverage this alleles, you know, alleles frequency in the different population. And with this I would like to acknowledge current collaborators, past collaborators and people in my lab and, you know, especially also my founding source in the energy ride that really, you know, helped me to make this transition support and make a rear in multiple stages. And, you know, for people that are excited about CRISPR technologies, there is a free conference, you know, next month. And, you know, we have like stellar speakers. So if you want to learn more, you know, please join us would be online. And you can also send an abstract if you want. And, you know, I will stop here. I will end off to Karen. Thank you. Okay. I will now add some context to the use of CRISPR technologies to identify mechanisms responsible for common disease that include variants in non-coding regulatory elements and then some perspectives on the future. Many common variants associated with common diseases and their related traits have been identified by genome-wide association studies or GWAS analysis. An example shown here is a recent results of a genome-wide association study screen for variants associated with LDL cholesterol levels. However, most of the variants that may be responsible for these association signals are located in non-coding regions. Thus, a key goal to understand the heritable contributions to disease risk is to identify the genes responsible for the associations. And identifying the specific variants and their molecular mechanisms rigorously links disease risk to gene function. So for non-coding variants, a frequently applied strategy is to examine the non-coding variants located in regulatory elements that correspond to disease-relevant cells or tissue types. I'm going to show two examples where experimental manipulations with CRISPR-based technologies provided needed validation of the hypothesized links between variants and genes. And these hypotheses are based on genetic strategies to evaluate how variants exhibit regulatory effects. So first we hypothesized that variants that are associated with gene expression levels and also with a trait or disease may function through that effect on that gene to influence the disease. And then to further understand how the variants may be affecting gene expression levels, examine the chromatin context of those variants. So variants associated with chromatin characteristics such as accessibility that are also associated with gene expression and a trait or disease may be acting through that effect on the regulatory element to influence the gene in the trait. So here's a first example. So with a genome-wide association study for glucose levels zoomed into this one region, this was performed in about 58,000 individuals. The same set of variants are also associated with islet RNA-seq-based expression levels of the gene in this region, ADC-Y5. The variants associated with lower gene expression level were associated with higher glucose levels and higher risk of diabetes. The function of this gene and its expression in pancreatic islets suggested that the variant may act to alter insulin secretion. So when examining the set of variants that could be responsible for this association signal, we compared the locations of those variants to regulatory elements in especially pancreatic islets. And you can see this variant located here is located within a region of accessible chromatin based on two technologies, fair and DNA hypersensitivity, and in a region that is an enhancer-based chromatin state in islets and really limited to not many other cell types. So we first set out to, as a first step to understand mechanism, tested that particular variant, the trait associated variant, in its regulatory element for effects on transcriptional activity. So shown here, we analyzed that variant in a reporter assay in the regulatory element, and both the regulatory element with either allele shows enhancer activity, however, the A allele at the variant shows reduced enhancer activity. And that's consistent with the lower levels of ADC-Y5 expression in the EQTL analysis. So while this experiment shows that the variant is capable of altering transcriptional activity, it does not validate the link from the variant to the gene or to gene function. And so for that manipulation, we turned to CRISPR. The regulatory element spanning this variant is conserved across species, allowing us to use a rat cell model that secretes insulin well. So in this experiment, we generated two double-stranded breaks, deleting the intronic regulatory element from the genome, and we obtained clonal lines with homozygous deletions. And we then measured the effect of gene expression, the effect of these deletions on gene expression and on cell function. So here, I showed that the, when the regulatory element is deleted in the homozygous enhancer deletion clonal lines, the gene expression level of the ADC-Y5 gene is reduced compared to lines that were created that were not targeting, not generate deletions. A nearby gene, sect 22a, was not affected, gene expression was not affected, suggesting that that deletion was limited in its impact. What about function of those cells? Does reducing, does removing this regulatory element alter the cell capabilities? And here I show that those deletion enhancer, the deletions of the enhancer show reduced secretion of insulin compared to the mock edited cells. So we can build a mechanism, looking at the variant, its effects on enhancer activity, that association with expression level and pancreatic islets from individuals, the association with glucose levels and risk of diabetes. And this evidence now with the CRISPR based deletion of the regulatory element makes that connection between the variant and the regulatory element and the gene, as well as its function and disease. Now, this strategy is this application of CRISPR technology is pretty low throughput, targeting efficiency is low requiring extensive screening of clonal lines, and really setting priorities among variant containing elements. So in this case, we were motivated by that evidence that the variant affects transcriptional activity, that the element was limited in the cell types, in which it was an enhancer, has characteristics of enhancers. But there were really few other candidate variants in elements that could be responsible for this GWAS signal, based on those annotations, that this element was conserved across species, and some other experimental and annotation support some allelic imbalance in chip seek data and evidence of specific binding proteins. I'll now turn to an example of using CRISPR technologies to understand genetic effects on regulatory elements, gene expression and disease. Here, we started by generating chromatin accessibility data in a pilot study of human liver samples from 20 individuals. We performed a tax seek to identify the accessible chromatin regions, identified peaks spanning the genome, and then tested variants that were located within one kilobase of the site of one of these peaks or regulatory elements to ask whether the alleles at that variant were associated with the level of accessibility. So in this study, we identified more than 3000 regions across the genome where this was the case. We then compared to these chromatin QTL signals to genome wide association signals to identify potential functions. So an example of that chromatin association is shown here. These are profiles of a tax seek data from nine different individuals shown across the width of the screen and the peaks then that are called as regulatory elements are shown below. Now you can see that most of these elements, most of these chromatin accessibility is similar across individuals. But in one region, the chromatin accessibility differs between individuals and the genotype of a variant located within that regulatory element is shown over here on the left. And you can see that individuals with that are homozygous for the T allele have stronger evidence of accessibility than individuals homozygous for the C allele. So this is a chromatin QTL variant. These elements are more often found these the ones that are genetically regulated are a little bit more often found in enhancers and in promoters when compared to chromatin states from Libertation Roadmap Epigenomics Consortium. The presence of some additional elements and otherwise termed quiescent region suggests that perhaps the greater genetic diversity of the individuals profiled here helped identify additional regulatory elements in the liver. So this region that I showed that's a chromatin QTL is also a signal that is associated with LDL cholesterol levels, and with expression level of the LIT AF gene in liver samples. So the LDL data is from around a million individuals that G was studied the QTL data is from around 1000 individuals, and here the same variant that was the lead, most strongly associated with LDL cholesterol and with gene expression was also the most strongly associated with the accessibility of this peak here the three prime and of the LIT AF gene. This gene potentially plays a role in more than one cell type within the liver tissue that we examined. And the variants within the regulatory element is really there are three variants that are found all close together located in the same regulatory element that are all proxies from one another. So again, as a first step, we tested these variants within the regulatory element to see if they could alter transcriptional activity. So here tested as a haplotype, and we examined four different cell types that could be playing a role in the liver tissue. In all four cell types we observed the same results that both haplotypes both the regulatory element serves as an enhancer compared to the empty vector, and that the alleles that were associated with greater chromatin accessibility showed greater transcriptional activity. And these are the same set of alleles that are associated with higher liver expression of LIT AF in the QTL analysis and higher cholesterol levels in the GWAS data. So, again, this does not demonstrate the data you do not yet validate the role of the regulatory element on the gene. To do that test we used CRISPR interference in a hepatic carcinoma cell line, Luca introduced the strategy earlier, and we used guide RNA spanning the enhancer and compared them to non targeting guide RNAs. And in this preliminary experiment the knockdown leads to a significant reduction in expression level of the LIT AF gene. These data provide important validation of the hypothesized link to the gene. The CRISPR-i system is much more amenable to higher throughput studies as the guide RNAs can be designed to many regulatory elements, including promoters, and to measure the effect of altering gene regulation as well as gene expression. So what does the future hold for CRISPR technologies to identify non-coding functional elements and their phenotypic effects? Well, first there will be improved CRISPR technologies, many technological advances, continued development of the biochemistry of editing, and the design of effective guide RNAs. The guide design that better considers the diversity of human variants will improve both the efficiency of targeting and reduce off-target effects, especially as more diverse genomes are edited. Newer technologies as Luca introduced of base editing and prime editing hold promise for more precise nucleotide changes with higher efficiency. Improved design of the biochemical aspects of CRISPR-i and its counterpart CRISPR activation can enable study of the consequences of a wider range of changes to gene expression levels. Screens can become more comprehensive and test a larger number of variants and elements in a wider set of cell types, cell states, and organisms. And improved signal cell technologies will enable DNA and RNA to be profiled at scale, facilitating understanding of genetic effects in these specific cell types. And finally, we can anticipate more in vivo applications, including to treat and reduce the impact of human disease. The future will also hold a more comprehensive understanding of non-coding functional elements and their phenotypic effects. So non-coding regulatory elements differ by cell environment, by specific cell states, and the cell types and the various cell states that differ during development and in response to perturbations of the cell environment. And non-coding elements also differ by genotype. And we'll learn about these effects through the CRISPR saturation mutagenesis screens, as Luca mentioned, and the study of natural human genetic variation in more and more individuals. One community that is performing this type of research is the new IGVF consortium. This consortium has multiple goals to understand the impact of genomic variation on function, and I've listed two of them here. And one is to apply technologies like the ones that we have described to systematically perturb the genome and figure out the effects of variation, of variants and elements on genome function and phenotype, and also to perform high resolution identification of where and when genes and regulatory elements function. Luca is a program director for one of the projects as described here, and he and his team are using base editing CRISPR screens to identify and examine the function of variants responsible for blood and heart diseases. And in one aspect of this project, Luca's team is collaborating with a team at UNC to examine variant function results across technologies and strategies. Finally, I'd like to acknowledge the people who contributed to the projects I showed, including the people in my group, past and present, our collaborators at UNC and elsewhere. Great. Thank you both so much for those great talks. We have some time now to answer questions from the audience. So remember if you have a question for our speakers, go ahead and post it in the Q&A box and I will ask it. I'm going to start with a question and then I'll ask them the other ones we've gotten in the chat while you're both were talking. So Luca, you know, the work that you talked about uses tools and technologies that are still relatively new. How have those kind of changed the way that you both have been able to address the questions that your lab is interested in? And how accessible are these kinds of technologies to other researchers? Yes, very good question. So I think, you know, like, we have seen a revolution, you know, like, for example, with CRISPR, you know, single cell technologies and, you know, initially, you know, they were not so accessible, but now like so many labs can easily, you know, deploy this technology. You know, and, you know, one barrier sometimes is like having the computation of tools, but now there are like several programs including this genomic innovator that, you know, like helping a researcher, you know, like me that are excited about developing tools to help others, right? So I think, you know, like, it's a nice, you know, framework, right? You know, there are new technology and I think now, like, we're much faster in adopting new technologies and, you know, with, you know, sharing, you know, it's like more open, you know, we share, you know, reagents and so I think, you know, like, I see like a really positive future, you know, there would be new technologies. But we have been already a system of sharing, you know, like this technology as well, you know, computational tools. So yeah, that's my short, you know, view on this. Great. So one of the questions that we had in the chat was about the difference between discovering enhancers and dissecting enhancers. I think probably like, what kind of tools would you use? What kind of information are you looking for? Can you guys speak to that a little bit? I can start with that one and say that identifying where the enhancers are located and which cell types they are active in requires looking at those different and having access to the different cell types, tissues, cell states of interest. I think that the dissection of the enhancer is identifying how does that, how does changes to that enhancer have an effect? How do we talk to a bunch about DNA variants that have an influence on the enhancer? And maybe acting by changing accessibility to the chromatin, by changing which transcription, you know, the availability of the sequence to transcription factors that combined and have their effects. Yeah, you know, maybe in terms of technology, one point I was trying to highlight in my talk, you know, you can delete completely, you know, one sequence and see, like, do you see that, you know, the gene that you, you know, you think is regulating change expression, right? You know, this kind of this direct, you know, assessment, you can use CRISPR eye and, you know, you can try to shut down that the enhancer can use if the gene is changing. And then dissect means that, you know, you already know that maybe this isn't an answer, right? And now you want to know what is the critical part, you know, what are the maybe time-based pair or, you know, 20-base pair that are important for this activity. And one example that we show, like, you know, in BC11A story was you just need a single binding site to shut, you know, to perturbe a single binding site to shut down that enhancer. So that's, you know, that difference between the discovery and then, you know, the dissection. So I think the next three questions that I saw in the chat kind of all relate to the types of things that are biology that will still make it more difficult for you to do these kinds of experiments. So the first one is about the repetitive nature of the genome. Of course, there is everything from small duplications in the genome to even large-scale duplications like segmental duplications. How do you deal with that repetitive nature of the genome when you're designing these CRISPR perturbations? Does that make it more complicated for you to analyze the data? How do you sort of deal with that in these kinds of experiments? I have learned those regions, so I think look at this question. Yeah, I mean, this is really, you know, a good question, you know, like, you know, you can, depending on the perturbation sides, you can say, well, you know, I can try to stay on the side. Hopefully, like, you know, if you have, for example, again, CRISPR-I, you know, this thing will spread and, you know, you, you know, feel like it will still, you know, shut down, you know, or activate the region. But if you have, like, you know, more precise, you know, perturbation, and, you know, if you, you know, you have the same region, you know, in thousands of places of the genome, then it's really hard to understand the fact of the perturbation. Because you don't know anymore what you're perturbing. So, you know, unfortunately, I don't have a solution for this. And yeah, but it's a really good question. Yeah. So the next, oh, sorry, go ahead, Karen. Well, the related question about cell type and developmental stage, right, offers opportunities to use to evaluate the cells at different stages of differentiation, either, you know, cells in a dish. And so from multiple individuals like IPSC derived cells that are differentiated or in the process of differentiation into different cell types, you can examine both the effect of variants and any perturbations to those cells and the change in developmental stage to understand their impacts. And I think sort of the next part of that question about cell type context and how important that is, is like, how do you even know necessarily whether you have the right cell type in your assay and how do you deal with that uncertainty when you're designing these kinds of studies. So I'm looking. Oh, go ahead. I'm looking forward to and making use of the single cell data that is helping identify what the underlying cell types are. And because the data that we generate in bulk tissue leaves that still a bit of a mystery. So identifying, say the chromatin QTL in by specific cell types can help elucidate some of that where the action is happening. Yeah. You know, there are like, you know, papers, you know, they show like already, you know, chromatin QTL or QTL, the value actually decomposing this in a different cell types or, you know, subpopulation, even in a continuous like, you know, like trajectories, right? So you have way more power, you know, to detect these signals. Karen's been doing a great job of leading into the next part of the question, which is, as we move into things like single cell assays, that generates a ton more data and probably a ton more computational challenges. Can you talk about sort of the biggest challenges of analyzing this data and sort of what maybe is the next important advancement in the field that's going to facilitate analysis of these sort of large scale CRISPR screens in a single cell sort of assay environment. Yeah, I can start from that, you know, you know, if we think about no computation, but you know what is my dream, you know, I think what we miss right now, like, especially if we want to assess, you know, element, you know, or variant, you know, change in gene augmentation to phenotypes, you can think, you know, maybe as the easiest one, you know, changing gene expression, the current in assays are still based on, you know, the guide count. So you actually don't see the leaves that you're creating the single cell. So you have, you know, this RNA readout. So it's really hard to, you know, have a definitive, you know, our eye resolution, you know, answer, if you cannot read, you know, what you're doing in individual cells. So in terms of technology, you know, I know there are some effort in this direction, but we are still not yet there, you know, they are not scalable. But that's, you know, the one thing that can change, you know, a little bit again, in terms of computational analysis, you know, it's becoming harder and harder. But, you know, the scale, you know, is increasing, you know, we have like, you know, several companies and labs that are innovating. And, you know, like the kind of skills that you need to develop, you know, you know, for example, if you're a postdoc or a trainees or, you know, a research scientist, you need now to learn, you know, like things related to big data or, you know, start to learn how to use GPUs or, you know, other techniques that, you know, before were not necessary, but at the same time is an exciting challenge, right, you know, like, I'm not complaining actually as it is an opportunity to actually leverage this new computational technologies and, you know, data structure that were adopted in different fields. So there was a question in the chat about combinatorial regulatory interactions. So, of course, you know, one of these regulatory elements is probably not necessarily working by itself. But that adds a whole other layer of complexity to the whole thing. So how do you use these kinds of tools and methods that you talked about to look at the combinatorial effects of different regulatory elements. We're really stumping you guys here. Yeah, this is a good question. I mean, I think, you know, like, this is a super exciting area. And you know, there are like some technical limitation, you know, if you think about single cell, you know, as a readout, you know, how many cells can you really profile and no many cells can explore. So if you have already an hypothesis about, you know, a small number of elements that can, you know, have, you know, some synergistic effect. You can design the screens but if you want to, you know, think broad, you know, I don't think we have, you know, that the technology and the funding in a level to explore, you know, the time to explore this large, you know, the large combinatorial space. So we need to be really careful in, you know, rational design, you know, the bears, you know, or, you know, the combination that we want to explore. So, yeah, that's my take on this. I agree prioritizing starting with elements that show an effect and then designing specifically which combinations to evaluate is the practical short term way forward to be able to investigate those combinations. So I see a question in the chat that reminded me of one that I had planned to ask you about. So, Luca, you received the genomic innovator award for the to help you fund this work. That is, as you know, a grant mechanism that's a little bit different from sort of the standard grant mechanisms. This is kind of the boring talk about NIH funding but it uses the R35 mechanism which is much broader in scope and less tied to like very specific research goals. So how has that helped you pursue this kind of work? Has that flexibility been important in sort of helping advance the work in your lab? Yeah, I mean, absolutely. I think this grant has been really transformative for me. You know, I'm a really collaborative person, you know, like I'm excited about too many things. So, you know, having the flexibility, you know, to do things that, you know, will advance maybe one area that can be gene regulation, but having the flexibility to collaborate, you know, broadly with many labs. You know, these are really, you know, enabled me to, you know, create many tools because, you know, the way we envision, I'm really excited about new technologies related to gene regulation, bro. And then you can come to different labs that, you know, generate data, you know, design together in experiments. And, you know, I think, you know, this is really different if you have like a rigid structure where you have, you know, a particular disease or a particular phenotype that you want to study. But, you know, it's really important research, you know, I like the model as well. But, you know, for kind of team science, I think, you know, the genomic innovator award, you know, really enables to do that without worrying too much about completing, you know, this sub-bame, right? So, yeah, I'm really grateful, you know, for this and you know, it's really helping my career, you know, in the past few years. Yeah, you mentioned the team science aspect. That was another sort of unique feature of the genomic innovator was that it was open to people who had been involved in some team science kind of projects. And I think, you know, some of your previous work and end code and what you guys will be doing in IGVF are good examples of how those team science approaches can be so important in helping us address big scientific questions that we couldn't really do quite as well on our own and individual labs. Yeah, totally agree. So the other question here then is also about funding and it's about funding and maintaining software tools. That's also a challenge and you've developed a number of software tools. You know, how do you, in your lab, maintain them? How do you support them? How do you fund them in your lab? Yeah, no, that's a really good question. It's really also a hard, you know, question. So it's really hard because, you know, like, you know, I don't think there are like specific mechanism. Maybe, you know, this is my ignorance, but I don't know many mechanisms that are specifically designed to maintain software. You know, unless, you know, your software is like so famous and, you know, foundational, you know, if you have like multiple softwares, it's really hard to maintain them. And, you know, my strategy, my practical strategy is like trying to use technologies like containers, you know, people probably have heard about dog care. Essentially, you know, that thing that you did, you know, like when you develop, it will always work, right? So this is like, you know, a partial solution. You know, you to, you know, maintain, then, you know, like you have to, you know, stay in this, you know, if we can say in this cage, right, you know, you have this, you know, small. So it's really hard, you know, then to maintain long term, but at least, you know, would be always reproducible and can always run the way you were intended. So I think it's a really hard, you know, question and maybe, you know, something that, you know, we can say, you know, we need more funds, you know, for boring things like maintaining software without no innovation. And it's just, you know, like, I'm not the funding, you know, agency, but I think it's really important problem. So, yeah, whether raised is, you know, like, yeah, really good point and, you know, I think we should all be aware that, you know, some software would be abandoned, you know, if there are no funding that, you know, will support the, you know, the development and the maintenance. So I see there's so many good questions in the chat now. So one of them is about something that's NHGRI feels is really important and it has been talking about a lot lately, which is the importance of including participants with diverse genetic industries in our research and how we really need to build that in from the start so that we can make the work that we do applicable to all populations. How do you both think about that when you're developing studies in this sort of area. I can start there. I learn a lot from every genetic variants that can be studied. And there are more variants present across the world in individuals of all populations then are found in just single populations. We follow up and focus on try to identify the basis for the association signals identified across those different populations. All of the biology that gets discovered is relevant to all of us humans it's just the opportunities that we have to characterize them and identify them through the presence of those variants. And it means that when we test and evaluate the the roles of variants that we can learn more about how elements variants and elements impact genes. No, I agree with that also, you know, like, for example, talking about IGVF, you know, this was one, you know, specific points that you know where to think together, you know, thinking about, you know, like, you know, being inclusive and you know, like, for my side, you know, I'm working with, you know, geneticists that, you know, can help us to think, you know, to say, what are the others just think broadly and you know there are like some gaps that you know we need to feel but you know make it to see that you know these themes are you know more and more important and you know maybe also required you know for some funding mechanism. So I'm going to ask another one of my questions so Karen and Luca both know that I'm involved in another NHGRI program called Gregor. What Gregor is doing is trying to address a problem that we have in human disease genetic research which is that the standard way that we look for the mutations that cause different genetic disorders, which is called exome sequencing is very often like more than half the time not able to find the mutation that causes the disease. And exome sequencing is one of those look under the lamppost type assays where it's looking in the most likely place, the exons. But that's of course only a very small portion of the genome. These other mutations that are harder to find could be anywhere and I'm sure a lot of them will end up being in things like these regulatory regions like enhancers. You both talked about some great examples of how those kinds of mutations could lead to disease. But one problem is that when you're looking at such a big part of the genome it's really hard to look through all of the variants and figure out which ones might be most important and help decide which ones to follow up on. So I wonder if either of you could comment about how these kinds of tools or the resources that might be developed by programs like IGVF might help researchers who are trying to look for those harder to find disease mutations. Yeah, that's a really good question. I think we have different ways to prioritize variants. IGVF we have a lot of active discussions about including existing annotation, fine mapping or other tools. And then thinking really hard about you have too many things that you can test. For example, if you want to perturbe. So what is the optimal way to rank them? If you want to efficiently explore this space and then there are for example within IGVF predicting model groups that are creating machine learning methods to help us to like potentially predict the impact. And we acknowledge that the tools are not perfect but if we have this effort of creating perturbation and without we can improve something that within IGVF we call the active learning strategy. So envisioning that it's not something we are going to solve in one year, but over the years we've become better and better with these efforts. So I don't know Karen if you want to comment on that. I think the continued discovery of regulatory elements recognizing that what we've discovered so far is mostly still at the basal or cell state or tissue state and that identifying the elements across more cell types. And in these perturbed environments, maybe those relevant to the rare diseases could help identify, characterize those elements, make the elements available. So then when characterizing when looking through those large number of potential genetic variance responsible for a rare disease to have some guidance some priorities perhaps based on the most relevant regulatory elements. And I see another question in the chat that's sort of related to this I think you probably sort of answered it but like these kinds of methods could also be used to help understand some of the variability that we see in these Mendelian disorders, even when we can find the mutation we still know that there's a lot of difference in the phenotypes that you see in different individuals even with the same mutation and so these kinds of tools and understanding variance and these kinds of elements could help. So I see one question in here. Oh, so the Genomic Innovator Program of course is a program targeted at early career researchers so people who had not had a major NIH grant before. Luca what advice do you have for an early career person who's maybe still working on their PhD who's starting to get into this kind of an area. My suggestion, some are obvious like reading a lot of papers, but one thing that was really important for me, try to follow the technology, try to see what are the new technologies and try to envision what may happen with this new technology. Because there are like these waves that we all experienced that were all related to new technologies, even in technologies single cell perturbation and then they were enabling so many new things so I think this really helped me to see a little bit where the field was going and there were like surprises, of course, but I think this is one suggestion I have and then also being really open trying to go into specific conferences and where you have a mix of different technology biological talks and just connecting and talking to people. I think it's like really empowering me too that I really like and I'm not associated with them. One is biology of genomes at Cospingarbor and ASAG that is also fantastic. So these are two meetings I tried at the time and they're really broad and you can connect with many people and so these are some suggestions. So I cannot let this kind of a question pass without telling everyone who's on this webinar to that another important thing and being successful in this kind of research is talking to the program officers at the Institute that is most relevant to your work. There are a lot of colleagues of mine including me and other people in my division who are here really to help you. Think about your research how to get it funded what are the best opportunities for that so make sure that you contact program officer, and I also am putting a link in the chat and I'm hoping that this shows for everyone to a web page that we have put some information about funding opportunities for new and early stage investigators. There's some just some general information linked to different events that we have like a grant workshop at ASHG and funding opportunities that are open specifically to early career researchers. And you talk at the conferences you know that's another thing that is so important you know often you go to these conferences so sometimes it's so nice just to chat in person. So let me look through these questions here so I saw one about okay we talked earlier about the importance of cellular context on looking at functional elements like enhancers. Of course, there's like also the level of how different cells within an organ interact with each other and how that might influence how different functional elements work. Can you apply technologies like this in more complex systems like organoids and if not sort of what are the challenges and doing that, or maybe none of you have tried that yet. I mean I'm not, you know I've not tried that I mean it's a nice question you know like, again you know I don't know personally experiment in my lab but I think it is a very important direction. So Christopher editing has been implemented in model organisms. So model organisms are maybe that step beyond organoids and so it is feasible and I think important when it's not just one cell type that and you're not looking for some cell autonomous result to be able to look at the relationship between cells and the impact of an change in one cell to how it influences the others. I see a couple of questions in here that are related to another sort of important area for NHGRI and that is about sharing data. So the one question is sort of how do you share this kind of data that comes out of your lab from these kinds of experiments. And then how will programs like IGVF share the data that comes out of that program so that it can be of use for people for example who are trying to interpret whether a specific variant might be important in a disease. That's a really good question. I think you know I'm always saying we always need to share the raw data you know that you know that the closest to the raw data and then we need to share as well the methods that we use to process this data and you know to reach our conclusions and you know for this there are like you know for raw data you know we have Geo for example for you know sequencing data for you know code we have like GitHub or you know other methods you know like you know assemble methods you know there are like different avenues where you can put like you know the soft version of your software that you use. And then you know like journals have been more and more you know demanding on this in a good way right you know that you need to actually share you know like you know and for some other you know there are like you know fixed share you know Zanodo and so you know we have like a system in place you know to share so you know if we don't do that then it's our fault. And you know like I think it is ridiculous now like to see some people you know we will share upon request you know like this is something that I think you know we should go away from that you know should be the default that you share should not have this statement you know and I know it's a lot of work to reach the point that you are confident in sharing your code and all the data but I think you know that's fun that should be higher you know it should not be possible to say well you know if if I if I if you contact me I will share the data you know in terms of a gvf is really nice you know to see that there are a lot of again discussion with the planning here thinking really hard about how can we maximize you know the data that we are going to generate and share and you know how to make sure that we share you know to a level that you know would be used right and you know so they would be like you know the raw data in a preprocessed state and then there would be something that we call the catalog that you know will essentially be something really accessible to the community that will also evolve over the years so I don't know Karen do you want to add your view. Right so I gvf like many NHGRI consortia is very interested in getting that data available and available as soon as possible. And so, being able to obtain that it will be feasible to obtain data that's generated I even prior to being published. My next catalog idea is to try to combine the data from multiple different types of experiments and types of data characterization and predictions to make that data more accessible. That's quite the challenge so I think that it is a long term process. One of the goals of itvf is to also make it possible for others to perform the kinds of experiments that are being done in that consortium and so making available protocols and trying to if there are standards identified between members of that community to, you know, get feedback on them and combine them make those available to be able to write there are a lot of cell type cell states and variants to cover so more than just one consortia's work. Yeah, yeah no I agree and I think you also highlighted sort of one of the major challenges we always at NHGRI want our data to have the broadest impact possible, but it is hard to understand for even think about all of the potential uses of the data out there in a way that makes it most accessible for people. So I think if you see these kinds of data sets and you have thoughts about what would make it easier for you to use I'm sure that there will be opportunities to provide that sort of feedback. So I think we're almost out of time I didn't get to all the questions but I just want to thank everyone for asking them in the chat. I also want to remind you that this is a quarterly seminar series we haven't scheduled the next one yet but we imagine it's going to be in January I think is what Chris said. So we hope you'll be able to join us for future talks keep an eye on our webpage and our social media to find updates about that. And I just want to again thank our speakers Luca Pinnello and Karen Molfi for these great talks and this really exciting discussion. And we hope to see you again at the next genomic innovator seminar.