 I want to thank you all for staying with us until the last day. And now your patients will be rewarded by an additional three fantastic talks. The first one will be given by Dr. Bing Ren from UC San Diego, who has really made many, many seminal contributions to the field of epidemics and the 3D genome organization. So today he's going to tell us something about the organization and the regulation of human genome. Bing. Thank you very much, Feng, for that kind introduction. You set up the expectation too high. I want to lower it. So I want to congratulate you for staying here for three days, and it's tough. I know it's a lot of information, so I like to, I don't want to give you more information, but want to say what will lie ahead of us after, say, we've managed to find all the functional elements in the genome. And that is to understand the organization and the regulation of the human genome. The questions start from these charts, and they put together very nice charts of many genome-wide association studies. It's growing, and now we have the Precision Medicine Initiative that is going to give us more sequence variants linked to human disease. The good news is we'll have more and more such information in hand very soon. The bad news is still that we have not much information to interpret the function of this kind of non-coding variants, and that's because majority of the human genome are not coding for proteins and interpreting their function has still been a major challenge. You have heard of the last couple of days, the efforts by the encode consortium in decoding functional elements in the genome. My lab and many others have been focused on the cis-regulatory sequences, the function of which is to control transcriptional output of the genome. And this we classified into promoters, enhancers, and insulator elements that carry out their own distinct functions in the genome. We know there are thousands of promoters driving transcription of a gene, and potentially millions of enhancer sequences that modulate activities of a gene in cell type and tissue and developmental stage specific fashion, and also many, many insulator elements whose function is to block transcription of enhancers such that they only work on a limited number of genes. Efforts in the last several phases of encode have led to the mapping of such elements in the genome, or at least predicted elements in the genome. It's now recognized that much of the human genome is devoted to such transcriptional regulatory elements. Roughly 13 percent of the genome is likely involving enhancer function, about 1 percent involved promoter function, and in any given cell type roughly 5 percent of the genome is devoted to transcriptional control of the genes. This number, even though it may not sound impressive, if you consider that only about one and a half percent of the genome is devoted to protein coding, you now know that transcriptional control is a major part of the genome regulation, genome function. We have now plenty of evidence that enhancers play a major role in regulating cell type specific expression patterns of the genes. This is actually a major concept that came out from the recent studies. There are so many enhancer elements in the genome. Prior to the effort of encode, if you ask anyone about what elements are controlling transcription, most people will tell you it's the promoter, and in fact, many clinicians to today will still use promoters to drive tissue-specific gene expression. And I think the study of the encode have put that concept completely outside, and now we recognize that if you are interested in tissue-specific regulation of genes, you better look far away from the promoter and focus on enhancer sequences. And by identifying tissue-specific enhancer elements, we can now determine transcription regulatory pathways controlling lineage specification, controlling developmental pathways by examining transcription factor binding sites in the genome, and it's frequent that these binding sites of transcription factors harbor mutations of genes, harbor mutations that link to human disease, and numerous examples have now been provided in the literature on this point. What's lying ahead of us? What's to expect in the coming years? I would like to focus on two questions that are related. One is that we recognize there are a huge number of enhancers in the genome, but how do they work? What genes do they control? And that is still an unanswered question, and it's both from the basic mechanism point of view and also from practical point of view of finding what genes they control. And once you find candidate elements that harbor mutations of the disease, the frequently asked question is, which gene is this element controlling? And that is because many such elements are located in the middle of a desert or in the middle of many genes, and it's not always clear which gene such element is controlling. So when you have a mutation in such an element, you are still left wondering what its consequence in terms of function. Now let's briefly review how enhancers work. This is summarizing many years of research, biochemical, genetic, and genomic data. We recognize that the first step of enhancers is function or activation begins with the binding of pioneering factors to chromosomal DNA that recruit chromatin remodelers that leads to the modification of nucleosome and eviction of nucleosomes on the binding sites. And that we call primed enhancer, and that leads to exposure of transcription factor binding sites that are further occupied by additional transcription factors and recruitment of additional chromatary remodelers. And during that process, a key step takes place, and that brings enhancer to the promoter in spatial proximity. We call it loops, or sometimes it may not be a loop, it could just be a rearrangement of the chromatin organization such that enhancer and promoters are spatially next to each other. And when that happens, promoters become essentially exposed to a whole set of transcription machinery that leads to initiation of the RNA polymerase assembly at the promoter and the transcription taking place. So let me give you one example to illustrate this process. We in the past several years have focused on a gene called SOX2 in the mouse and around the stem cells, and we predicted based upon the chromatin signatures that there is enhancer next door about 130 kilobase away that might be important for SOX2 regulation. And you can see that the promoter of SOX2 is lying about 130 kilobase away from this sequence that is occupied by a number of transcription co-activator proteins, histone modifying markers that is a telltale sign of active enhancers such as K4 monomethalation and K27 acylation. And if you knock out this sequence, what you find is that if you knock out from the sequence from both allele, SOX2 transcription is done from both allele. If you knock down transcript of this enhancer from one allele, you can see a little specific loss of transcription from that allele but not from the other allele, and this is reciprocal. So this series of monolithic and the biolithic deletion mutants inform us that the enhancer that we identified indeed is controlling SOX2 expression and only SOX2 because you can see genes next door are not affected by such deletions. Now how does a sequence like the SOX2 enhancer act over 130 kilobase away to control SOX2 expression? And now we generally agree that this is happening because there is a spatial loop that connects SOX2 promoter to the enhancer. But how do we prove this? We can prove this from using two alternative and orthogonal strategies. One is called chromatin-chromosome conformation capture initially invented by Yobdecker and has now extended to a multiple method, one of which is called 4C6. What this does is you cross link cells with formaldehyde and then perform a restriction digestion and in situ ligation. This ligation allows team DNA fragments that are spatially close to be ligated together and then you can circulate this ligated junctions and use a pair of primers to amplify the insert from a restriction's digestion site. And this insert can be sequenced and plotted in this fashion. So basically the anchor corresponding to the PCR primer here will basically shown up in the middle here and its interaction sequences shown as the frequency of interaction will basically are showing either on the top as a frequency and the bottom as the heat map showing the p-values. So what this kind of map tells us is that for every sequence that you use as an anchor, what other sequences does this interact with? You can do this experiment using anchors corresponding to the SOX2 promoter and what you can see is that the enhancer sequence that I just mentioned are showing up as a very strong interactors to the SOX2 promoter and that is consistent with the model that these two sequences are spatially close despite their long genomic distances. If you delete the enhancer sequence, what happens is such long range interaction no longer occur. And that indicates that enhancers, one of the major functions is to bring it to the promoter to initiate the transcription of the gene. We can use an alternative strategy, orthogonal strategy, prove the same point. And that is use fluorescence in situ hybridization. You can design probes, usually these are Phosme probes that covers SOX2, covers the enhancer or cover a sequence further upstream, sorry, further downstream that is of equal distance. And you can basically use different colors to label, you've probed with different color and in 3D microscopy you can measure this distance quite precisely. Usually you can measure a population of cells and this is the average distance between the promoter and the super enhancer or the super enhancer and the downstream control sequences or promoter and the control. And what I'm showing you here is that in the wild type cells, the distance between SOX2 and the enhancer is shorter than the enhancer and the control despite that they both have equal distance. This already support the notion that enhancer and promoter are similarly, are close. And what's also interesting is if you delete this enhancer sequence, the distance between the promoter and the enhancer in general is now much larger and that distance is statistically similar to what you see from this enhancer to this control sequence. So this proves that indeed what we call the spatial looping model between enhancer and promoters indeed exist inside the cell. Another point I want to make is that you notice the spread of the distance among these AD cells is actually quite large. What this also tells us is that the spatial distance between enhancer and promoters is not static. It's actually in any given cell population it is a small population of cells have very close interaction. And then in another fraction of the population the distance is large and then not statistically different from random controls. What this implies is that the enhancer work probably on and off to control the promoter activities. Can we use this information of 3D chromatin interaction data to infer target genes of the enhancer? And that's what I would like to show you. Thanks to Yoke Decker and Rez Lieberman in 2009, a technology called Hi-C was invented that allow you to investigate DNA-DNA interactions in a genome-wide fashion using the Hi-Super-DNA sequencing technology. The idea again was to fix the cells with formaldehyde and then digest the DNA in C2 and then ligate a loosened up DNA ends and then sequence those ligation product with Hi-Super-DNA sequencing machine. And you can generate hundreds of millions or billions of such paired end reads. And when you process those reads you can obtain a heat map of frequency of in ligation product between any pairs of DNA along each chromosome. And this is shown up here as a heat map. And if you zoom into this block and turn 45 degree you will see this heat map as such triangular shaped heat maps. And this is not random a general uniform distributed heat map but rather a triangular shaped heat map. What this imply immediately is that the chromatin is folded into a domain like structure and each domain corresponds to this triangle. And the size of each domain is roughly a million base pairs. And we can devise method to call the boundaries of this domain and determine where they are. In general we found that there are 2,000 such domains in the genome in any given human cell type. And we call those domains topologically associating domain. And we have since mapped such chromatin interaction maps in over dozens of several dozen human primary cells and the human tissues. And one thing you can immediately appreciate is that such topologically associating domains are generally invariant across diverse tissues and cell types. And this is a statistics basically over if you compare seven primary cells over 50% of domain boundaries are shared in all seven. And if you look at only those cell type specific boundaries less than around 10% are such cell type specific boundaries. So this indicates that majority of the topologically associating domains are generally cell type invariant. Similar picture emerge also if you compare all the primary tissues that we investigated about 14 different tissues were investigated corresponding to the ectoderm, endoderm, and the mesoderm tissues. And again you can see that nearly 50% of the topologically associating domain boundaries are shared by all 14 tissues and only about 10% are cell type or tissue specific. So this immediately raised the question whether topologically domain this TADS are functional. And there has now a lot of evidence supporting the notion that the TADS is a unit of chromatin folding and also is a functional unit of gene regulation. While experiment performed in Stefan Mangelo's lab published last year nicely illustrate such this concept, they discovered that in several family of human patients with such limb developmental disorders, the patients carry a genomic deletion that all in all three cases happen at the topologically domain boundaries. And basically, for example, in this Brackhold-Deck-Thilly patient, the deletion led to the jacks position of the EPH4A enhancer with a developmental regulator known as PAC-3. And that led to the atopic expression of PAC-3 in limb and causes Brackhold-Deck-Thilly phenotype. Similar story can be told in two other cases, each share the same feature that topologically associating the main boundary was deleted causing new gene expression patterns that resulting a limb developmental disorder. How does this happen? It's now recognized that CDCF, zinc finger being a binding domain containing factors, plays a central role in the formation of the TADS. It binds to zinc finger binding sites, sorry, CTCF binding sites, and they hold to end of the topologically domain together, either as a handcuff or in this more dynamic chromosome extrusion model. Basically, when these two CTCF binding proteins bind together and they also form dimers in space, this domain can form and allow enhancers within the domain to interact with promoters within the same domain, but not outside of the domain. And within each topologically associating domain, we can see a lot of enhancer promoter interactions, and we now know that such interactions are generally transient and can be a variable among cell populations. And our task is can we identify such transient and yet frequent interactions between enhancer and promoters? We had our first attempt several years ago by applying high C to a human fibroblast cell but with extreme deep sequencing. And we identify regions that basically interact with every anchor DNA along the genome. And we can see that for a piece of DNA such as promoters, there are nearby sequences that interact with this promoter as higher frequency than expected by chance, and we can identify such interactors. And along the genome, we identified over about a million interactions involving such significant interactions. On average, the distance between each pair of interactors is about 160 kilobase. And this gave us the first map of chromatin interactant in the human cell. But it involved require a lot of sequencing, billions of sequencing, and thousands of dollars of sequencing. We have been trying to find a way to reduce the cost such that we can apply this to many different cell types and tissues. And to that purpose, we have now adopted a capture sequencing technique. We designed capture probes for each of the 20,000 promoters in the human genome. For each promoter, we designed 12 capture probes that are basically biotinulated RNA oligos. And we use it to hybridize to the high-C DNA library and sequence the resulting captured DNA. This is using similar protocol with modification that was published by Peter Frasier's lab, Doc Higgs' lab, and Ricard Samberg's lab. And when we apply this strategy to the fibroblast cell, you can see the top is the high-C full data sets from 1.6 billion unique paired-end reads. And what's bottom is from 1-tenth of that number of reads, 160 million reads. But when you look at each promoter-centric interactions, what we found is this nicely captured what you actually see in the genome-wide scale. But this comes at about 1-tenths the cost. That gave us the ability now using the same tool to investigate all the tissues that we have in hand. And basically, we apply this to roughly 27 cell and primary tissue types that correspond to both embryonic cell lineages and ectoderm, endoderm, and mesoderm lineages in the adult. Just to summary what we have learned, out of this 27 tissue capture high-C, we identified nearly 700,000 unique interactions centered on promoters at a resolution roughly 5 kilobase because we're using a restricting enzyme that cut every 5 kilobase. Average interaction distance is roughly 200 kilobase. A lot of the interactors involve promoter enhancer and nearly half involve promoters and other regions that are not marked, which potentially provide information on new kind of regulatory sequences. Many we also discovered are involving promoter. Promoter interactions, again, this is consistent with what Eugene Renslap and Mike Snyder's lab reported in 2013 about extensive promoter-promoter networks. What does this map tell us? First of all, we want to make sure that we are capturing functional interactions. And we compare this interaction map with EQTLs. EQTLs are those that basically you can use genetic means to identify sequence variants correlated with transcription and potentially driver of the transcription of genes. When we compare this, we found that a large number of EQTLs are captured by our protein promoter-centric interaction maps. By control, this is what you would expect. So this indicates clearly that what we are capturing are a set of highly interacting functional enhancer-promoter interactions. Another example we can identify is the unknown enhancer-promoter interactions reported in the literature that involve the FTO gene that is linked to the obesity. So this is a SNP that has been linked to obesity. And its targeting has actually, over the years, been figured out to be not at all FTO, but located several hundred kilobase away involving two genes, IRX3 and IRX5. So when we examined our map of enhancer-promoter-centric interaction maps, we can see clearly IRX3 and IRX5 to be linked both to the FTO SNP. So basically this gave us a lot of confidence that the maps we generated could be useful for interpreting non-coding variant in human disease. And we then apply this to a set of GWAS SNPs involved in the brain disorders. There are 900 such GWAS that have been reported in the literature. And we try to determine what are the genes that they might be engaged. If you just look at their nearest gene, there are about 320. But if you use our maps, you can actually find about 1,700 that are linked to them. So from the long-range looping interaction maps, we can see that we can identify a much larger set of genes that potentially could be engaged in such brain disorders. And these genes, if you do go analysis, you can find many enrichment of developmental patterning and moronic skeletal system development and anterior-posterior pattern formation. So this gave us a potential new list of genes to investigate and formulate new hypotheses for interrogation further on. To close, what we have now is a paradigm to understand how non-coding variants control gene expression. We have now, thanks to ENCODE, we have now maps of cis-regulatory elements where we can generate hypotheses of which whether this SNP is affecting cis-regulatory elements. Now with a promoter-centric chromatin interacting maps, we can link such sequence variants to candidate target genes and to design experiment to test whether they have functions. There are, of course, a lot of questions raised by this kind of studies. We now have the means now to answer them. For example, we can determine, do the identified cis-regulatory elements control the transcription of those candidate target genes? And do the SNPs actually affect binding of certain transcription factors and result in alteration of targeting expression? I'd like to end by thanking my colleagues involved in this study. The TAT was discovered by two former graduate students, Jesse Dixon, who is now a fellow at Salk Institute and Siddharth Selvaraj, who is starting a company focusing on the phasing of the genome using high-C data. Chromatin interaction maps were generated by Fulai Jin in my lab, who is now leading a lab in Case Western. And the more capture sequencing data was the work by a graduate student, Anthony, an in-cown post-ocular fellow in the lab. And these are, we can't do this without our funders, specifically the ENCODE, Epigenome Roadmap, and the Ludwig Institute for Cancer Research. Thank you very much for your attention and to be happy to answer your questions. Hi, it's incredible and super nice talk. I'm really amazed about your work. I'm so surprised by you said that the TAT boundaries are more or less self-type independent. But then I would thought that the chromatin structure would play a really important role in differentiation and self-differentiation. So how do you explain that? Aren't you surprised about that? So they are consistent because chromatin interactions with engaging hands and promoters typically happen within the general confine of TATs. But TATs are defined not by promoter-enhancer interaction, at least I don't believe, but it's more defined by the CDC-avidinant interactions and cohesin association with TATs. So during development, you form the TATs and that structure is essentially define what enhancer and what gene are interacting. And that interaction is transient and it's self-type specific. But that typically happened within this general confine. Second question is more practical. You said you have this chromatin maps for many cell types. Do you have it for adult heart tissue? We do. We have that for ventricle tissues. Is it available online? We are writing up and hopefully we'll get it published very soon. Yeah, thank you. That was extraordinarily lucid. Sorry, where are you? Okay, here. Right here. Extraordinary lucid and therefore, I'm sure we all have million questions about what we have said about, particularly about the looping model of the regulation of the genes. But since TATs as are invariant between the cell types, they obviously probably have universal features which govern chromosome packing and the interactions as such. So one would therefore have to think about 13% of the enhancers regulating 22,000 genes, let's assume that, and a few microRNAs here and there. So do you envisage that one enhancer, one loop interacts with multiple promoters at one time, number one and number two, or is there something missing which I don't know where to put my finger on, because the number of the loops that we already know of is extraordinarily limited. Maybe we can count those genes on our fingers. Would you comment on this prayer knowledge that we only know about loops in only three or four genes and the examples which we have already given and the second one is does an enhancer therefore control a multiple set of genes physically I'm talking about a physical loop controls multiple genes? Yeah, so this is a very good point and they touch upon the very basic question about how enhancers work. And from decades of research, we now know that enhancer can interact with multiple promoters, not simultaneously but potentially sequentially. And they're a very nice example was provided by Gert Blobel. They can demonstrate forced loop interactions between beta-globin and LCR with two actually promoters. And actually it can happen in the same cell type, same cell population but most likely because they can basically, the loop can happen a fraction of time here, a fraction of time there but they don't happen simultaneously. So this is the general understanding of how enhancer work. And when we say loop, I actually want to step one step back. I don't want to call them loop because loop imply there is some extrusion outside that don't interact with anything. We don't have so far any evidence that there is such a protrusion. This may exist but I think it's better to stay to say their enhancer promoters are close and what happened in the intervening sequence, we don't know. I think that require higher resolution analysis and I do think that we have to keep in mind such interactions is not static, such interactions are always happen in and out just like we individuals move constantly, enhancers and promoters do move constantly. Thank you. Thank you very much for this talk. I think that you all make it sound so easy which is why my next question is by using that promoter capture IC method, what, how low did the price get or how much sequencing do you have to do to run experiments so I can think about budgeting for that? Yeah, yeah. So basically if you want to do the full high C with about kilobase resolution, you need about six billion paired and reads to achieve that. Now with promoter capture, you can probably lower that by to about 10%. So you can do the calculation, six billion is about $20,000. Now you can expect 10 to 1000 to 2000 per capture high C experiment to give you the same resolution as you would with the full high C in depth analysis. Thank you. Sure, one last. That the enhancers can be interacting with different genes but within the same cell at different time points. So doing the promoter high C, am I missing, am I missing some events and that's not happening at that time point where the cross linking was done. So we are capturing some interactions and maybe perhaps missing some. So the idea is that we're looking at millions of cells. So hopefully we're capturing a snapshot of what's happening for that promoter across a spectrum of time. But you're right, if you're interested in the temporal events that happen, for example, six hour after you induce the cell to do something. And that's where I think such temporal dimension needs to be considered and we'll have to do a temporal time course. Thank you very much for your attention. Thank you.