 All right. So my name is Joan Moore. I'm the new phase four encode DAC project manager. And I'm going to be talking today about an overview of the registry of CREs that we created at the end of the third phase of the ENCO project. As G.P. mentioned, I'm going to hand over to Michael that's going to give you an overview of the online tool screen that you can use to access the registry. And then we're going to go through some use cases of actually how you can apply the registry to try to annotate genetic variants, learn more about disease associated genes, and actually apply it to your research. So this is an overview of the ENCO registry of CREs. CREs stands for Candidate Regulatory Elements, as G.P. has already mentioned. So what it actually is is it's a collection of regions that may have different functions. So as you've mentioned, we have promoter-like elements, enhancement-like elements, and what we're referring to is CTCF-only-like elements which may function as insulators or possibly the anchors of different chromatin loops. What we actually did to create this was incorporate data from hundreds of cell tissue types. So we have over 600 unique cell types included in the registry for human and over 100 for mouse. And for this, we actually collected data from both the ENCO project and incorporated data from the Remap Every Genomics project as well. What we did is we created registries in both human and mouse, and this allowed us to be able to compare across species as well and look at elements that are conserved between the two. And so this is a brief overview of actually how we created the registry of CREs. We based everything on what we're calling representative DNA hypersensitivity sites. So if we look in the genome here in green are signal tracks from DNA-seq experiments. So DNA-seq assays open regions of chromatin, and presumably these regions are open due to transcription factor binding. And these regions may have punitive regulatory regions because of these TFs binding interacting with one another and interacting with transcriptional machinery. When we looked across the genome, we can see that a lot of these sites are fairly consistent across different cell types. So these are six different randomly surveyed cell types here. As we can see even across, for example, placenta tissue compared to natural killer cells, the regions of DNA hypersensitivity tend to be fairly consistent between the cell types. So we use this fact to actually create what we're calling representative DHSs. So what we did is for over 400 cell types with DNAs data, we clustered all these regions together and picked a representative DHS or DNA hypersensitivity site to represent each one of these clusters. And as you can see here, these are the black DHSs up at the top. So we did this for human across 400 cell tissue types and for over 60 cell types of mouse. And our next step was then to try to annotate these RDHSs with other epigenomic signals to try to understand what their curative regulatory function might be. So we incorporated three additional single types, two histone modification, chip-seq data sets, H3K4Me3, which is known to be enriched at promoters as well as some distal enhancers. H3K27AC, which is known to be enriched at active enhancers and active promoters. And then we also incorporated CTCF transcription factor chip-seq binding as well. And this is just a bit of a global view where here we have the gene SP1. We see high histone mark signal at the, near the transcription start site of this gene. And then we have some distal regulatory elements here. This one would be enhancer life since we have high H3K27AC. This one would be CTCF only, possibly an insulator, possibly the anchor of a chromatin loop. So we have two types of classification schemes based on these other marks. One is what we call cell type agnostic. So this is a classification, for example promoter-like enhancer-like that's not going to change across cell and tissue types. To create this, we use this a bit of a complicated flow chart here, but essentially it looks at the maximum signal of these three supplementary epigenomic marks across all cell and tissue types. It also considers distance from transcription start site. So, for example, if you have a region that has high H3K27AC and high H3K43, but it's very far, very distal from a transcription start site, it's more likely to be an enhancer than, for example, a novel promoter element. So for these classifications, they stay the same across all cell and tissue types, but we also want to have cell type specific annotations as well. So this is, for example, a particular region. What's its activity, say, in B cells compared to T cells compared to the brain? And so we did this for every single CRE and over 600 cell types for humans. And this is just an example of what that actually looks like. This is done in GM12878, which is a lymphoblastered cell line. Here, we can have all these different combinations of signal. So, for example, here, we can have high of all four marks. Other cases, we may just have high H3K27AC and DNAs. Or for a CRE to be active in a cell type, we do require it to have high DNA signal by definition. So we have all these different combinations of signal. We want to try to combine them together into a little bit of an easier way so biologists maybe are interested in all enhancer-like elements that are active in GM12878 or all promoter-like elements. So it's a little bit tough to distinguish what those are from just these signals here. So what we did is we looked at this complementary signal for call 2, EP300 and red 21 to see if these separate groups clustered into any naturally forming groups. When we looked at this group up here, we looked at EP300. That's a transcription factor that's known to bind enhancers. We had this group up here. It had really high EP300 signal. We classified those as enhancer-like. We also had this natural group here that had really high red 21 signal. And red 21 tends to co-localize with CTCF, particularly at the anchors of chromatone loops. So that seemed to be another naturally forming group there. And then we had these groups that had very high call 2. So presumably they're going to be transcription start sites and are going to be promoter-like. So this is just our new classification scheme here. If we take these more simplified groups such as promoter-like signatures or PLS and enhancer-like signatures ELS, CTCF only, and then a group of DNAs only. So we have this for all of the CREs that we identified in the genome. We also were able to validate experimentally this classification scheme. So for example, these ELS CREs here, we annotated them in mouse as well through some of the embryonic time points that you can mention before in that matrix. And then Len Panapio and Epsilon selection experiments, they tested them in embryonic mouse transgenic assays. And so our high-ranking ELS CREs tended to have greater success than the lower-ranking ones. So this is just an overview of what we've actually defined. In humans we have 1.3 million CREs and in mouse we have just over 400,000 CREs. And this is the breakdown of their cell type diagnostic classifications. This is just a brief overview of even though we don't have every single cell and tissue type assay to the registry, overall we do have fairly high coverage. So here we took cell types that don't have DNAs data to see what the overlap of their histo-marked peaks are in the registry. So for example, we only had any anchors for these cell types, but still by combining multiple cell types together we're able to pick up most of the DNA type or sensitivity sites.